(1) I have read that "ERA5 includes an uncertainty estimate that provides guidance on where products are more/less accurate."[1] What does this mean? What exactly are uncertainties when using ERA5?
ERA5 uses weather observations where such observations are available. On top of these observations ERA5 uses a weather forecasting model to produce a spatially and temporally continuous data. Like a weather forecast, the resulting data contains some uncertainty.
ERA5 uncertainty estimation help understand the relative accuracy of the ERA5 system , i.e., to identify areas/periods where the products are thought to be less or more reliable, although the uncertainty values provided by the EDA system should not be taken at face value. The EDA system addresses uncertainties related to the observing system, sea surface temperature and the model (through its physical parametrizations).
(2) What does the uncertainty mean? Is it the real error of the reanalysis?
The uncertainty as defined for ERA5 by the Ensemble of Data Assimilations (EDA) system is not a classical measure of error with respect to the ERA5 reanalysis product. The EDA takes into account mostly random uncertainties in the observations, sea surface temperature (SST) and the physical parametrizations of the model. In principle, as long as these uncertainties are properly described and there are no additional sources of uncertainty, then the EDA will properly describe the reanalysis uncertainties. However, systematic model errors are not taken into account by the EDA and the errors (uncertainties) as defined by the EDA are uncorrelated Furthermore, for affordability reasons the EDA has a lower resolution than ERA5 itself, so the EDA system is unable to directly describe all the uncertainties of ERA5. Therefore, in summary, there are limitations on the use of the EDA system for uncertainty estimation in ERA5 because not all the uncertainties are accounted for and also because the EDA system was not actually designed for uncertainty estimation. Nevertheless, comparison of uncertainties provides excellent information on when and where the reanalysis products are more or less accurate (such as for recent dates compared to 30 years ago when fewer observations were available), and where for a given day or season there are larger uncertainties (such as close to tropical cyclones or in the storm tracks).
(3) How do I obtain the uncertainty estimate data based on the ERA5 EDA system?
Uncertainty estimates are available from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS), as part of the ERA5 dataset, and there as 'Product type' = 'Ensemble mean' and 'Ensemble spread'.
For 'authorized' users of ECMWF (e.g. National Meteorological Services, but not for users self-registered at ECMWF), the uncertainty data is archived, along with all ERA5 data, in ECMWF's data archive MARS. Uncertainty data is archived in stream 'Ensemble data assimilation' (enda), as Ensemble mean (type=em), Ensemble standard deviation (type=es) and Scaled ensemble standard deviation (type=ses). MARS catalogue entry point (restricted): http://apps.ecmwf.int/mars-catalogue/?class=ea&stream=enda&expver=1
(4) Can I take the numbers for uncertainty at face values?
No, don't take the uncertainty values at face value, though the EDA-based uncertainties are valuable to provide a relative estimate of uncertainties in terms of spatial and temporal distribution. In other words, the EDA can be used to get an idea of which areas and which periods ERA5 is more, or less, reliable.
(5) How is the uncertainty estimate obtained? Which sources of uncertainty does it account for and which does it omit?
The uncertainty estimation for ERA5 is obtained from the Ensemble of Data Assimilations (EDA) system. The EDA addresses some uncertainties of the model and data assimilation system, but not everything. The EDA accounts for uncertainties in observations, sea surface temperature (SST) and model physical parametrizations. Other uncertainties are not accounted for, such as uncertainties in radiative forcing due to greenhouse gases, or systematic errors in the model or the way in which observations are used.
(6) How reliable is the ERA5 uncertainty estimate?
The reliability of the ensemble system can be measured using spread-skill (reliability) diagnostics. This measure describes how the spread of the ensemble can match the skill of the system. In the optimal case the ensemble spread should fully match the model skill, so the reliability diagram would be a diagonal line. The reliability of the EDA system is different for different variables, levels and reanalysis time periods. Generally speaking, it can be said that the EDA system is rather reliable (though generally under-dispersive, i.e. the spread is lower than the skill) and possesses information about the uncertainty of ERA5. A typical example of reliability diagnostics for surface pressure for the spring season for various reanalysis periods can be seen here:
(7) Does the uncertainty also account for systematic errors in ERA5, or only for random errors?
The uncertainty estimates MOSTLY account for random errors and NOT for systematic ones. The exceptions are the applied perturbations for sea surface temperature SST, that do incorporate estimates of systematic error. Only the random errors are accounted for in other observations and in the physical parametrisations of the model. Therefore, one limitation of the uncertainty estimation is that systematic errors are not well addressed.
(8) Could you outline, in a nutshell, the strengths and weaknesses of the uncertainty estimate?
The main importance of the uncertainty estimation for ERA5 is that it provides added value to the ERA5 reanalysis product. This is based on physical considerations using an Ensemble of Data Assimilations (EDA) system. The EDA system addresses uncertainties in the ERA5 assimilation and modelling system, which is quantified by a 10-member ensemble. The EDA is able to indicate where ERA5 is more and where it is less accurate (for instance due to changes in observation coverage). The weakness is that the EDA does not account for all sources of uncertainty (such as systematic errors or correlated errors) and the EDA has lower spatial and temporal resolution than ERA5 itself. The latter means that it is not always easy to find a direct correspondence between the ERA5 reanalysis variables and the EDA uncertainty characteristics.
(9) Where can I find monthly-mean values for uncertainty?
There are NO monthly mean values for uncertainties available. It should be computed by the users. It is highly recommended that the instantaneous spread values should be used to compute such means (if you start from the monthly mean variables then the spread will be unrealistically smooth)
(10) Why is the uncertainty information only available 3-hourly, whereas the ERA5 reanalysis data is available hourly?
The uncertainty estimates for ERA5 are provided by a 10-member Ensemble of Data Assimilations (EDA) system, which has lower spatial and temporal resolution (~60km horizontal and 3h temporal resolution) than that of the original ERA5 product (~30km horizontal and 1h temporal resolution). This lower resolution 3-hourly dataset can be used for ERA5 uncertainty estimation. The reason that the EDA for ERA5 has lower spatial and temporal resolution than ERA5 itself is the fact that the ensemble system should be comparable to ERA5 in terms of supercomputer resources (since there is a link between the EDA and the ERA5 through the background errors used in the assimilation) and in terms of data volumes. We cannot afford to run an EDA at a similar resolution to that of the ERA5 production system and in any case it would probably not give enough additional information to justify the higher costs.
(11) Where is ERA5 more accurate and where less, and how does the uncertainty evolve over time?
Seasonal spread charts give an idea about the level of uncertainties for different seasons, regions, periods, levels and variables. For instance, for the summer season of 1980:
For 200 hPa zonal wind the largest uncertainties are in the tropical regions. | For 850 hPa temperature, the uncertainties are generally larger in the Southern Hemisphere (this corresponds well with the fact that we have fewer observations in the Southern Hemisphere). | For MSLP the Antarctic region has the largest spread/uncertainty. |
For all variables it is clear that the uncertainties are decreasing with time, i.e. the spread values are smaller for recent periods than for older ones.
(12) In the 1980s there are several short periods when the uncertainty is larger. Can you explain this?
One of the most important aspects that determines the ERA5 uncertainties is the amount and quality of available observations. The Global Observing System (GOS) has been evolving during the ERA5 period, which means that the observation amounts are generally increasing with time and as a result, uncertainties are decreasing. However there are some short periods, where there are fewer observations available. Typically, in the 1980s when the number of satellite observations was still quite low, there are some short periods, when missing observations cause an increase in the uncertainty i.e. an increase of the ensemble spread. The evolution of the mean spread for vorticity and temperature, for 3 different model levels, demonstrates this:
vorticity | temperature |
It can be seen that generally the spread (uncertainty) is steadily decreasing over time except for some jumps in the early periods. These jumps correspond to the blips in observation amounts. For instance, at the end of 1979 there were some shorter periods when the MSU and SSU instruments onboard the TIROS-N and NOAA-9 satellites were providing significantly fewer observations than normal. It is noted here that in the vast observing system of the present day, there is a degree of resilience which means that the assimilation system is much less sensitive to the failure of one instrument or satellite.
(13) When I look at an ensemble spread field I see that in some cases it is noisy. How can I use it?
Indeed, the instantaneous spread fields might be noisy at particular locations especially in the early reanalysis periods. For instance on 00 UTC 19800301 the MSLP spread is noisy over the Antarctic, the 850 hPa temperature spread is particularly noisy in the Southern Hemisphere and the 200 hPa zonal wind spread is noisy in the tropical region:
MSLP spread | 850 hPa temperature spread | 200 hPa zonal wind spread |
The main reason for this is the limited ensemble size of 10 members that introduces considerable sampling noise. On the other hand, if we consider the mean seasonal (JJA 1980 in this case) spread for the three variables the fields are much smoother and easier to interpret. For seasonal mean fields, this sampling noise is averaged out and as a result will provide smoother spread fields:
MSLP (mean seasonal (JJA) | 850 hPa temperature (mean seasonal (JJA) | 200 hPa zonal wind (mean seasonal (JJA) |
(14) When I look at active systems such as extra-tropical cyclones or tropical cyclones I expect a larger uncertainty, yet I do not see that clearly in the ensemble spread
The main problem with the extra-tropical and tropical cyclones in terms of uncertainty is the fact that due to the lower resolution of the EDA system, the EDA members systematically overestimate the central pressure of the cyclone (i.e. the pressure is not sufficiently low). This means that the spread among the members remains small and consequently the EDA shows lower uncertainties than in reality. On the other hand the spatial pattern of the uncertainties correspond rather well with the actual cyclones. This is demonstrated for some extra-tropical cyclones like cyclone Desmond: 2015120500, cyclone Xaver: 2013120500 or the Great Storm of 1987 in the UK. In all of these cases the maximum spread values don't exceed 1 hPa, which is quite small. The pattern of large spread values is scattered throughout the domain, though the primary cyclones are reasonably well-marked in the uncertainty field (particularly for the 1987 storm). For tropical cyclones the spread values can be larger, as it is for a cyclone near to Japan in 1987 or for the Haiyan typhoon near to the Philippines. It is very interesting to see the case of Hurricane Sandy, where the region with the largest uncertainties is not fully in agreement with the location of the hurricane's eye, but with some peaks to the east and west of it (the values are larger to the east). This indicates the uncertainties related to the position of the hurricane. So overall the EDA spread can give a qualitative idea of the uncertainties relating to active systems such as cyclones, but it is unable to provide the right uncertainty amplitude due to the lower resolution of the EDA system.
(15) When I look at the central pressure of tropical cyclones I know that the ERA5 reanalysis is far too shallow. However the ERA5 ensemble spread is quite small, suggesting a far more accurate estimate. Could you explain?
Mostly, the resolution of ERA5 is not sufficient to properly describe tropical cyclones. Additionally, the EDA system is of lower resolution than that of ERA5, which means further limitations for describing such small-scale phenomenon. For the above mentioned tropical cyclone case the lowest pressure of the cyclone is 969.7 hPa (see figure below), which is higher than the real observed pressure, but the cyclone itself is reasonably well described. The larger spread area corresponds well with the shape of the cyclone and the largest value is 2.7 hPa. This gives an indication about the relative uncertainty of the event, though the spread is presumably smaller than the real analysis error.
(16) Uncertainty information is available 3-hourly. How can I approximate uncertainty for the intermediate hours? Should I use interpolation, the temporally nearest, ...?
There is no general recipe provided in this matter. Common sense should prevail and of course the specific user need. Certainly, one solution is to have a temporal and spatial interpolation, though one has to understand the limitations using the interpolated uncertainty values. Please, note that errors in interpolation might be smaller than the effective errors in the uncertainty estimate itself.
(17) If I compute a climatology of a parameter derived from the HRES, do you have a methodology to assess its uncertainty using the information provided by the EDA?
We do not recommended to compute a single mean from the ensemble spread values. The EDA system addresses mostly random errors (and not systematic ones), so one has to be careful computing and interpreting mean climatologies.
(18) Is there a forecast evolution of the indicators made available for the EDA (e.g. quartiles, percentiles ...)? If so, what horizon and what would these indicators be?
Only ensemble spread and ensemble mean and the raw ensemble members will be provided to the users. So all the additional processed quantities should be computed from the ensemble members by the users. The ensemble forecasts are also available and can be used by the users.
(19) Are the uncertainty indicators available on the same 137 vertical levels as data from HRES?
Yes. Uncertainty indicators are available on the same model levels (137 model levels, as in the high resolution operational IFS model), pressure, potential temperature and potential vorticity levels as the ensemble members. The raw ensemble data (all members) are available for the users to compute any processed information.
Related articles