ECMWF plans to use the operational Acquisition System to acquire the S2S real-time data for their ingestion in the S2S database.
The Acquisition System is based on the FTP protocol, using agreed filename patterns and frequency of data transfer. Once the data is at ECMWF computers, we will have suites to process, quality control and archive it. Note that the quality control will be limited, eg, check the GRIB headers and plot some fields.
Technical information to be provided by each partner:
- Name of the server where ECMWF should fetch the data from. Note this server will have to be visible from outside the firewall of your organisation for the following incoming IP addresses
193.61.196.106
193.61.196.107
193.61.196.108
193.61.196.109
193.61.196.110
- For each cycle:
- Path to the cycle, for example: /dir1/dir2/$yyyy/$mm/$dd/
- Filename(s) for that cycle, for example:
s2s_${centre}_${version}_${stream}_${yyyy}${mm}${dd}${hh}_${levelType}_${ensembleMember}.grib2
centre: eccodes/mars acronym as per https://apps.ecmwf.int/codes/grib/format/mars/centre/
version: test/prod
stream: enfo/enfh for real-time/reforecast outputs
levelType: pl/sl/pt/pv/ol for pressure/surface/potential temp/potential vorticity/ocean level
ensembleMember: 000, 001, 002.... (000 for control forecast, 001 for the 1st epsmenber etc)
example (UKMO): s2s_egrr_prod_enfo_2019101000_pl_002.grib2
=> model outputs are split into separate files only by type of forecast (real-time or reforecast), level type and ensemble number (all steps and parameters are merged into one file)
- Preferred transfer protocol (currently supported: ftp, sftp and http)
- Username and password for the above
- Estimated size (bytes and fields) of the real-time forecast per cycle as well as the re-forecast. Please add this information to the table on S2S Data Provider Information and contacts
Technical contact point for testing all of the above
General Principles:
- Partners should make data available as soon as it is produced. This is to give ECMWF time to acquire/archive the data and update the web portals before the release date 3 weeks after production.
- Data should be quality controlled at source. For example, by running tools like grib_api's grib_check, which does some basic checking on expected entries in the GRIB header. This is to avoid un-necessary transfer and processing of wrongly encoded data which could be identified at your site.
- In case of problems while processing the data at ECMWF, partners are requested to keep a number of cycles available, should there be a need to re-process them. Typically, this could mean keeping data for up to a month or longer. Sometimes, partners will be asked to re-run a cycle in order to fill gaps.
- Files should be of manageable size. Extreme examples of what is not suitable is 1 file per field or 1 file per cycle. Something that could work well is:
- 1 file for control forecast all time-steps, all types of level, all levels
- 1 file for each EPS member, all time-steps, all types of level, all levels
- Typical filesizes managed by the Acquisition system are few hundreds of megabytes. This is to avoid timeouts for lengthy transfers. Obviously, the depends on the connection to the remote site.
- Agree on a mechanism to 'notify' that a cycle is available. For example, partners could populate a directory
path_to_the_cycle.tmp
, and then rename it to the final name once completed. ECMWF's acquisition system can be instructed to remotely clean the data after it has been transferred, or it can be left for the remote site to manage. - Note ECMWF needs to be able to process a cycle fast enough before the next cycle is available. Generally, we aim at being able to process one cycle in half of the elapsed time between cycles. For example:
- for a Centre that produces S2S data once per week, we need to be able to transfer, check and archive such cycle in less than 3 and 1/2 days.
Acquisition of re-forecasts:
- If possible, we would like to use the same mechanism for the re-forecast, although we already know that for some Centres, due to their size of several Terabytes, this will not be feasible. Alternatively, we will use USB disks. This is been trialled with CAWCR.
- It will be useful to know the expected size of each re-forecast.
Sample data
Put your test sample data to:
- ECMWF's ecgate (verify the permissions of the directory)
- alternatively use any other downloadable location (e.g. google drive)
alternatively use this ftp:
host: ftp.ecmwf.
int
user: observations
passwd: observations2013
directory: s2s_ocean
Steps to follow
- Provide technical information to establish data transfer. Send e-mail to:
- Routinely transfer test data for few weeks/months (version = test)
- Once everything is working, start transfer in 'Production' mode (version = prod)