ERA5 CDS requests which return a mixture of ERA5 and ERA5T data

Created by Michela Giusti, last modified on Feb 19, 2021

The ERA5 hourly and monthly data are made available with a 3 month delay. This means that after a month has passed, another month's worth of ERA5 data is written to the dataset.

ERA5T (near real time) preliminary data are used to fill the gap between the end of the ERA5 data and 5 days before the present date. The oldest month of these is overwritten each month as new ERA5 data become available.

So as an example, say we have a current date of 15th February 2020:

ERA5 data are currently from 1/1/1979 - 30/11/2019 (instantaneous variables) and 1/1/1979 - 1/12/2019 (00-06 UTC, accumulated variables)
ERA5T data (with a 5 day delay) are from 1/12/2019- 10/2/2020 (instantaneous variables) and 1/12/2019 (07-23 UTC, accumulated variables)- 10/2/2020

For requests which return a mixture of ERA5 and ERA5T data (such as for data from the 1st of the month), instantaneous variables (e.g temperature) come from ERA5T (which has 'experiment version' of 5) while accumulated variables (fluxes, precipitation) come from both datasets with the following structure:

00-06 UTC on 1 day of the month from ERA5 (expver 1)
07-23 UTC on 1 day of the month (and the following dates up to 5 day from present) from ERA5T (expver 5)

When these data are converted to netCDF a new dimension is created called expver containing 1 and 5. Moreover, a single time coordinate is used which covers the entire requested period.

dimensions:
        longitude = 1440 ;
        latitude = 721 ;
        expver = 2 ;
        time = 24 ;
variables:
        float longitude(longitude) ;
                longitude:units = "degrees_east" ;
                longitude:long_name = "longitude" ;
        float latitude(latitude) ;
                latitude:units = "degrees_north" ;
                latitude:long_name = "latitude" ;
        int expver(expver) ;
                expver:long_name = "expver" ;
        int time(time) ;
                time:units = "hours since 1900-01-01 00:00:00.0" ;
                time:long_name = "time" ;
                time:calendar = "gregorian" ;
        short tp(time, expver, latitude, longitude) ;
                tp:scale_factor = 9.06276558810304e-07 ;
                tp:add_offset = 0.0296950577259784 ;
                tp:_FillValue = -32767s ;
                tp:missing_value = -32767s ;
                tp:units = "m" ;
                tp:long_name = "Total precipitation" ;
data:

 expver = 5, 1 ;
 ...
}

Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields.
For the expver 5 data, the first 7 timesteps are padded with empty fields, with the remaining timesteps coming from the ERA5T data.

When the last ERA5 data are released, they will overwrite the ERA5T data for the entire month and for accumulated variables for 00-06 in next month. This process will be repeated each month.

31 Comments

Xiaobo Yang
Notice for the time being, if you download only ERA5, or ERA5T, the above mentioned dimension 'expver' will not appear. This makes it difficult to tell the difference between ERA5 and ERA5T.
- Permalink
- Feb 18, 2020
Julia Wagemann
It seems that if one requests hourly total precipitation ERA5 data for 1 January 2020, the file contains both expver versions (1 and 5) and the file size is doubled (around 99 MB). For other days, the expver dimension does not appear.
- Permalink
- Mar 01, 2020
1. Xiaobo Yang
  Thank you for reporting this, Julia. We are looking into a long term solution now. Unfortunately it will take some time.
  Permalink
  
  Mar 02, 2020
Michela Giusti
As pointed out above, only mixed ERA5/ERA5T data has 'expver'. When users consider accumulated variables the file has "the following structure:
- 00-06 UTC on 1 day of the month from ERA5 (expver 1)
- 07-23 UTC on 1 day of the month (and the following dates up to 5 day from present) from ERA5T (expver 5)"
So for your case, data for 00-06 UTC of 1 January 2020 is ERA5 while the rest of data are ERA5T. Data for 2 January 2020 are only ERA5T so 'expver' does not appear. Moreover, please pay attention "Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields." This means that the empty fields contain NaN values.
- Permalink
- Mar 03, 2020
1. Alberto Troccoli
  Hello Michela and Xiaobo
  I can see why you want to keep two expver but I think it's making things more complicated than needed for users. Moreover, the introduction of the two experiments is breaking codes, with consequent time loss trying to first identify the issue, and then find a (not-so-striaghtforward) solution. My suggestion would be to get rid of the two expver and just communicate when changes to ERA5T, when they become ERA5, are made, as anyway already indicated in Release of ERA5T
  Could this solution – i.e. merging expver 1 with 5, so no expver dimension/parameter appears in the retrieval – be implemented please? I think it'd be much cleaner if this was done at your end.
  Thank you very much
  Alberto
  Permalink
  
  Mar 18, 2020
  1. Xiaobo Yang
    FYI Alberto, we are thinking about to have 'expver' as a dimension for all ERA5 and ERA5T data.
    
    Permalink
    
    Mar 18, 2020
Carl Svboda
I have the same issue as Julia Wagemann when downloading SurfaceSolarRadiation for all available 2020 timesteps, the first six hrs of 01/01/2020 are expver = 1, but the rest of the 2020 timesteps are expver = 5
- Permalink
- Mar 09, 2020
1. Michela Giusti
  Yes, this is because Surface Solar Radiation is an accumulated parameter and January is a month with ERA5 and ERA5T mixed data. For these reasons, the file has the following structure:
  00-06 UTC on 1 day of the month from ERA5 (expver 1)
  07-23 UTC on 1 day of the month (and the following dates up to 5 day from present) from ERA5T (expver 5)"
  So also in your case, data for 00-06 UTC of 1 January 2020 is ERA5 while the rest of data are ERA5T. Data for 2 January 2020 are only ERA5T so 'expver' does not appear. Moreover, please pay attention to the empty fields which contain NaN values. This happens because "Both expver dimensions use the full time extent of time coordinate but the expver 1 data only covers the first 7 timesteps, the remaining timesteps are 'padded' with empty fields. For the expver 5 data, the first 7 timesteps are padded with empty fields, with the remaining timesteps coming from the ERA5T data."
  Permalink
  
  Mar 09, 2020
Richard Berg
Semi-related to this topic... Is there any documentation for why the most recent data for ERA5T instantaneous variables are available only from 0 - 21Z for the most recent available day, and the accumulated variables are available through 06Z the following day?
- Permalink
- Mar 09, 2020
1. Xiaobo Yang
  I understand this is the best we can do for the time being.
  Permalink
  
  Mar 10, 2020
2. Xiaobo Yang
  Our technical team commented: "the accumulated fields are forecast fields from forecast starting at 18h , while the instantaneous fields are analysis fields from the 9-21h assimilation window."
  Permalink
  
  Mar 10, 2020
  1. Richard Berg
    I see, thank you for the quick response. And are the accumulated and instantaneous data released through CDS at 18Z and 21Z, respectively, or is there some specific lag (computational) time for each?
    
    Permalink
    
    Mar 11, 2020
Luiz Angelo Steffenel
Did someone found a simple way to get rid of expver dimension, or at least to filter out Era5 and Era5T data on mars scripts? As Alberto told, this is breaking a lot of codes. In my case, I download Ozone Total Column in a "monthly" base to automatically generate maps with NCL. However, the presence of expver adds a dimension that NCL can't understand, and I can't get rid of it (could not find an easy way: I can remove the expver variables but the dimension remains).
- Permalink
- Dec 28, 2020
Gifty Attiah
Hi, is there a solution to removing the expver dimension? I have the same situation with the November, 2020 data and it is messing up all my codes which works perfectly on data from 2003 till date. I will really appreciate the help if anyone knows a way to achieve this. Thanks
- Permalink
- Jan 13, 2021
Kevin Marsh
Hi,
At the moment i think the easiest way is to retrieve the ERA5 and ERA5T data in separate requests, by careful selection of the dates. In this way you would get 2 netCDF files without the 'expver' dimension which you can then merge if required,
Thanks
Kevin
- Permalink
- Jan 14, 2021
marco venturini
Hi,
If you are looking for a Python workaround, you can use Xarray function reduce(np.nansum, 'expver'). In this way you can collapse the dimension summing each other the two expver arrays, that perfectly match (the one is NaN when the other got a value). I know that it isn't politically correct, but with 1 row you avoid tons of code stop working.
- Permalink
- Jan 20, 2021
Gifty Attiah
Hi.
Thank you all for the helpful responses. I appreciate the suggestion marco venturini. That is a solution I can work around.
Thanks
- Permalink
- Jan 20, 2021
Shahid Mehmood
cdo --reduce_dim -copy in.nc out.nc
worked well for me and it removed expver dimension
- Permalink
- Jun 17, 2021
Jian Tang
this will do the trick

import xarray as xr
ERA5 = xr.open_mfdataset('era5.tp.20200801.nc',combine='by_coords')
ERA5_combine =ERA5.sel(expver=1).combine_first(ERA5.sel(expver=5))
ERA5_combine.load()
ERA5_combine.to_netcdf("era5.tp.20200801.copy.nc")

from https://unseen-open.readthedocs.io/_/downloads/en/latest/pdf/
- Permalink
- Jul 22, 2021
1. Temidayo Popoola
  Thank you for sharing, Jian Tang. It worked smoothly and seamlessly. You just save me from days of sleepless nights.
  Permalink
  
  Aug 09, 2022
Axel Schweiger
Just downloaded ERA5 data (ssrd) for March 2022 and the "expver" seems to be gone. Is this going to be gone for good? The problem is that when the number of dimensions vary over time, it is difficult to find a coherent way to process the data.
- Permalink
- Apr 11, 2022
1. Michela Giusti
  Hi,
  March 2022 has no "expver" because there are only ERA5T data.
  
  Thanks
  Permalink
  
  Apr 12, 2022
  1. Axel Schweiger
    Thanks Michela
    How come? I thought there was only a 5 -day lag or am I misunderstanding something here?
    
    Axel
    
    Permalink
    
    Apr 12, 2022
John McInnes
I'm looking at a 10m u component of wind .nc I just downloaded for 2022. It has the expver dimension all right. But the values aren't all NaN. The expver 5 values start as NaN. Then at 2022-02-13 02:00 the values all become -1.7. They stay that way until they transition to sensible values at 2022-05-01 00:00. Then the expver 1 values become -1.7.

I guess my question is how can I tell when to use the value from expver 1 and when to use expver 5? Detecting NaN seems to not be sufficient.
- Permalink
- Aug 09, 2022
1. Kevin Marsh
  hi John, Can you share the request you used to retrieve the data, please?
  Thanks,
  Kevin
  Permalink
  
  Aug 09, 2022
  1. John McInnes
    Hi, try this. The transition from expver 1 to 5 in the resulting netcdf file was at time index 3625. I used Panoply to view it.
    {
    'product_type': 'reanalysis',
    'format': 'netcdf',
    'year': 2022,
    'variable': ['10m_v_component_of_wind'],
    'month': ['1', '2', '3', '4', '5', '6', '7'],
    'day': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'],
    'time': 'all'
    }
    
    Permalink
    
    Aug 17, 2022
    1. Kevin Marsh
      
      Hi John,
      Thanks for reporting this; I think this may be an issue with the grib to netCDF converter due to the size of the request. In this case, if requesting netCDF it may be better if you just request 1 month at a time (although months containing a mix of era5/era5T will need careful handling as only these will have the 'expver' dimension)
      Hope that helps,
      Kevin
      
      Permalink
      
      Aug 22, 2022
Xiaobo Yang
Hi John,
expver is used to tell the difference between the initial release (expver=5, called ERA5T) and validated ERA5 data (expver=1). See the link below for details.
ERA5: data documentation#Dataupdatefrequency
In most cases, ERA5 is identical to ERA5T. Therefore, if you spot any unusual behaviour, please let us know.
Thank you,
Xiaobo
- Permalink
- Aug 09, 2022
Ghufran Altekreeti
Hi.. I downloaded hourly data in (nc) file extension and open it in panoply, when export data in (csv) format the time was in Gregorian calendar.. how can convert hourly data from Gregorian calendar date to date calendar data?
- Permalink
- Apr 02, 2023
atiqah azhar
Hi, can someone help me. I can't remove the expver using cdo vertmean, cdo -sellevel,1. The error said:
Warning (cdfCheckVars): 5 dimensional variables are not supported, skipped variable z!
Warning (cdfInqContents): No data arrays found!
cdo vertmean: Open failed on >geopotential.nlev.nc<
Unsupported file structure
This is the info from my nc.file.
ncdump -h GH.nlev.nc
netcdf GH.nlev {
dimensions:
longitude = 1440 ;
latitude = 721 ;
level = 14 ;
expver = 2 ;
time = 538 ;
variables:
float longitude(longitude) ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int level(level) ;
level:units = "millibars" ;
level:long_name = "pressure_level" ;
int expver(expver) ;
expver:long_name = "expver" ;
int time(time) ;
time:units = "hours since 1900-01-01 00:00:00.0" ;
time:long_name = "time" ;
time:calendar = "gregorian" ;
short z(time, expver, level, latitude, longitude) ;
z:scale_factor = 4.86892538277471 ;
z:add_offset = 156350.109482621 ;
z:_FillValue = -32767s ;
z:missing_value = -32767s ;
z:units = "m**2 s**-2" ;
z:long_name = "Geopotential" ;
z:standard_name = "geopotential" ;

Thank you.
- Permalink
- Nov 29, 2023
Kevin Marsh
hi, I think the method described in an earlier comment may help. Just use the small python script from Jin Tang:
(replace era5.tp.20200801.nc era5.tp.20200801.copy.nc with the name of your input/output files)
import xarray as xr
ERA5 = xr.open_mfdataset('era5.tp.20200801.nc',combine='by_coords')
ERA5_combine =ERA5.sel(expver=1).combine_first(ERA5.sel(expver=5))
ERA5_combine.load()
ERA5_combine.to_netcdf("era5.tp.20200801.copy.nc")
to remove the expver dimension in your downloaded data file.

Running it on this file:
ncdump -h test_in.nc
netcdf test_in {
dimensions:
longitude = 1440 ;
latitude = 721 ;
expver = 2 ;
time = 12 ;
variables:
float longitude(longitude) ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int expver(expver) ;
expver:long_name = "expver" ;
int time(time) ;
time:units = "hours since 1900-01-01 00:00:00.0" ;
time:long_name = "time" ;
time:calendar = "gregorian" ;
short t2m(time, expver, latitude, longitude) ;
t2m:scale_factor = 0.00182472256828442 ;
t2m:add_offset = 257.866053886274 ;
t2m:_FillValue = -32767s ;
t2m:missing_value = -32767s ;
t2m:units = "K" ;
t2m:long_name = "2 metre temperature" ;

cdo info test_in.nc
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter ID
   1 : 2023-01-01 00:00:00 1 1038240 0 : 218.55 277.46 313.44 : -1
   2 : 2023-01-01 00:00:00 5 1038240 1038240 : nan : -1
   3 : 2023-02-01 00:00:00 1 1038240 0 : 225.35 276.67 307.91 : -1
   4 : 2023-02-01 00:00:00 5 1038240 1038240 : nan : -1
   5 : 2023-03-01 00:00:00 1 1038240 0 : 218.47 276.55 310.27 : -1
   6 : 2023-03-01 00:00:00 5 1038240 1038240 : nan : -1
   7 : 2023-04-01 00:00:00 1 1038240 0 : 209.07 277.40 311.98 : -1
   8 : 2023-04-01 00:00:00 5 1038240 1038240 : nan : -1
   9 : 2023-05-01 00:00:00 1 1038240 0 : 210.15 278.76 313.10 : -1
10 : 2023-05-01 00:00:00 5 1038240 1038240 : nan : -1
11 : 2023-06-01 00:00:00 1 1038240 0 : 198.08 280.23 311.80 : -1
12 : 2023-06-01 00:00:00 5 1038240 1038240 : nan : -1
13 : 2023-07-01 00:00:00 1 1038240 0 : 198.80 280.80 317.66 : -1
14 : 2023-07-01 00:00:00 5 1038240 1038240 : nan : -1
15 : 2023-08-01 00:00:00 1 1038240 0 : 199.11 281.23 314.89 : -1
16 : 2023-08-01 00:00:00 5 1038240 1038240 : nan : -1
17 : 2023-09-01 00:00:00 1 1038240 0 : 198.98 280.38 316.13 : -1
18 : 2023-09-01 00:00:00 5 1038240 1038240 : nan : -1
19 : 2023-10-01 00:00:00 1 1038240 1038240 : nan : -1
20 : 2023-10-01 00:00:00 5 1038240 0 : 207.78 279.84 310.16 : -1
21 : 2023-11-01 00:00:00 1 1038240 1038240 : nan : -1
22 : 2023-11-01 00:00:00 5 1038240 0 : 216.54 278.59 310.12 : -1
23 : 2023-12-01 00:00:00 1 1038240 1038240 : nan : -1
24 : 2023-12-01 00:00:00 5 1038240 0 : 230.68 278.14 309.12 : -1

gives:
ncdump -h test_in_flatten.nc
netcdf test_in_flatten {
dimensions:
time = 12 ;
latitude = 721 ;
longitude = 1440 ;
variables:
float t2m(time, latitude, longitude) ;
t2m:_FillValue = NaNf ;
t2m:units = "K" ;
t2m:long_name = "2 metre temperature" ;
float longitude(longitude) ;
longitude:_FillValue = NaNf ;
longitude:units = "degrees_east" ;
longitude:long_name = "longitude" ;
float latitude(latitude) ;
latitude:_FillValue = NaNf ;
latitude:units = "degrees_north" ;
latitude:long_name = "latitude" ;
int time(time) ;
time:long_name = "time" ;
time:units = "hours since 1900-01-01" ;
time:calendar = "gregorian" ;

% cdo info test_in_flatten.nc
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter ID
   1 : 2023-01-01 00:00:00 0 1038240 0 : 218.55 277.46 313.44 : -1
   2 : 2023-02-01 00:00:00 0 1038240 0 : 225.35 276.67 307.91 : -1
   3 : 2023-03-01 00:00:00 0 1038240 0 : 218.47 276.55 310.27 : -1
   4 : 2023-04-01 00:00:00 0 1038240 0 : 209.07 277.40 311.98 : -1
   5 : 2023-05-01 00:00:00 0 1038240 0 : 210.15 278.76 313.10 : -1
   6 : 2023-06-01 00:00:00 0 1038240 0 : 198.08 280.23 311.80 : -1
   7 : 2023-07-01 00:00:00 0 1038240 0 : 198.80 280.80 317.66 : -1
   8 : 2023-08-01 00:00:00 0 1038240 0 : 199.11 281.23 314.89 : -1
   9 : 2023-09-01 00:00:00 0 1038240 0 : 198.98 280.38 316.13 : -1
10 : 2023-10-01 00:00:00 0 1038240 0 : 207.78 279.84 310.16 : -1
11 : 2023-11-01 00:00:00 0 1038240 0 : 216.54 278.59 310.12 : -1
12 : 2023-12-01 00:00:00 0 1038240 0 : 230.68 278.14 309.12 : -1
So the expver dimension is removed.
- Permalink
- Dec 20, 2023

Feedback: C3S User Satisfaction Survey - CAMS User Satisfaction Survey

Web: C3S Help and Support - CAMS Help and Support

Page tree

31 Comments