Introduction
This dataset provides every day, hourly timeseries of air quality forecasts at European observation sites that have been optimised using a Model Ouput Statistic (MOS) method.
The CAMS MOS uses a machine learning algorithm to improve the Chemistry-Transport CAMS European air quality Ensemble forecasts for 4 pollutants (O3, NO2, PM10, PM2.5) at the observation sites. It is optimised on an automatic basis from predictive variables (predictors) over a learning period and delivers a 4-day forecast (0h to 96h).
By performing an adjustment of the raw forecast of the regular chemistry-transport forecasting system, it belongs to the category of Model Ouput Statistic (MOS) products.
Description of the MOS method
Approaches
In the framework of a previous CAMS Service (CAMS_63), several machine learning postprocessing approaches (as MOS methods) have been experimented in order to take stock of the recent development of machine learning applications.
Among the different configurations, several learning periods were tested as well as different set of predictors for each species and various performance indicators.
The selected MOS approach applies to the whole Europe meaning that a unique statistical model is built with CAMS Ensemble forecast, IFS meteorological and Observation data covering the whole modelling domain. The advantage of this global modelling approach is that a very short time period is needed to gather enough data to train a robust model (good performances have been obtained with a short time period for training). To optimise the performances, a new model is built on a daily basis with the most recent available data. Any change in the modelling system (upgrade of a member of the Ensemble model, addition of new observation sites…) is thereby automatically and rapidly passed on a new MOS model producing appropriate correction.
In order to validate the definitive configuration in term of robustness, performance and computing time, assessments have been carried out and published in Bertrand et al., 2023: https://acp.copernicus.org/articles/23/5317/2023/
At this time, the learning period is defined at 3 days.
Predictors
The MOS is trained, over the 3 days learning period, with hourly air quality observations and modelling data (for both air quality and meteorological parameters) and predicts hourly concentrations.
The training is based on the relation between predictors and observations as the target element to define a statistical model. This statistical model is able to convert the same predictors into a concentration forecast, which is here our predictand.
Several sets of predictors have been investigated and results have shown that a limited set of predictors including the concentrations from CAMS Ensemble forecast, some predicted meteorological variables and recent observations provides good performances.
Criteria for used observations :
- background observation sites, specific selection based on an objective classification (Categories 1 to 7 of the Joly and Peuch, 2012 classification, corresponding roughly to urban, suburban and rural background observation sites)
- hourly observations
- 75% availability rate of observations over the learning period
The MOS production takes place once a day from 6:30 UTC and produce forecasts for all observation sites available if the above criteria are met. Thus, the amount of observation sites varies following the species and the date.
Data used for MOS :
European air quality Ensemble forecast variables | IFS Meteorological forecast parameters | EEA Air quality observations |
---|---|---|
|
|
|
Data access
Data is available for download from the CAMS Atmosphere Data Store (ADS). CAMS ADS registered users can access the available data interactively through the CAMS European air quality forecast optimised at observation sites ADS download web interface and/or programmatically using the API as per instructions detailed here.
Data availability (HH:MM)
The processing takes place at 6:30 UTC and the delivery is guaranteed by 8:00 UTC on the ADS.
Spatial resolution
Timeseries are provided at individual observation sites.
Temporal frequency
The MOS model runs once a day from 6:30 UTC.
Data are available with a time resolution of 1 hour and forecasts period from step 0h to step 96h.
Data format
Data are available in csv format with semi-comma separator. The files are split by date, countries and species.
In the file, the observation sites are declared by their EIONET identifier.
An associated metadata file is available from the download form and gives information on the observation sites (coordinates, altitude, type of observation site as provided by the European Environment Agency, date_start, date_end).
Please also note that the location of some observation sites may change in time. As soon as an observation site displacement occurs, a new line appears in the metadata file with the new coordinates. To date these coordinate changes, date_start and date_end columns indicate the start and end dates for which MOS was produced at these specific coordinates.
Product listings
Please note that not all species are available at all observation sites for all the timesteps.
Variable Name | NetCDF Units | Variable name in ADS | Note |
---|---|---|---|
Nitrogen dioxide | µg m-3 | nitrogen_dioxide | Data are available from 17-01-2024 |
Ozone | µg m-3 | ozone | Data are available from 17-01-2024 |
Particulate matter < 10 µm | µg m-3 | particulate_matter_10um | Data are available from 17-01-2024 |
Particulate matter < 2.5 µm | µg m-3 | particulate_matter_2.5um | Data are available from 17-01-2024 |
Validation reports
MOS production evaluation will be made available at station level and aggregated by country through an interactive visualization platform
Example visualisation code
See below an example of how to download the data using the API and plot the data for a station:
Guidelines
- Users can select either 'Raw' or 'MOS-optimised' daily air quality forecasts at European observation sites. The raw forecasts are European air quality ensemble forecast interpolated to the observation site location. The MOS-optimised forecasts are produced from the raw forecasts using a statistical post-processing method called machine learning postprocessing as an Model Output Statistic (MOS) method. Both types are provided in the same format.
- Missing values may be present in the MOS product for some species and/or hours, due to the lack of observations available at the observation site for that species/hours. Indeed last observations are needed to produce MOS because they are used as predictor.
How to acknowledge, cite and refer to the data
All users of data uploaded on the Atmosphere Data Store (ADS) must provide clear and visible attribution to the Copernicus programme and are asked to cite and reference the dataset provider.
(1) Acknowledge according to the licence to use Copernicus Products.
(2) Cite each dataset used:
(3) Throughout the content of your publication, the dataset used is referred to as Author (YYYY) i.e. METEO-FRANCE et. al (2024)
References
- Bertrand et al., 2023: https://acp.copernicus.org/articles/23/5317/2023/
- Joly, M., & Peuch, V. H. (2012). Objective classification of air quality monitoring sites over Europe. Atmospheric Environment, 47, 111-123.