This page contains out dated information that may be incorrect or misleading. Data providers are advised to use the simplified set of rules provided here: CDS Toolbox requirements

idCDM
version1.0
date2017-03-28
authorF. Fierli (ISAC-CNR)
versionauthorreviewerinternally approved by
1.0F. Fierli (ISAC-CNR)C. Cagnazzo (ISAC-CNR)A. Amici (B-Open)





1. Background

The document contains the Common Data Model Specification for the Copernicus Data Store Toolbox.

The scope of the document is to:

  • List the data formats and definitions used inside the Toolbox
  • Provide a guidance to the toolbox users 
  • Provide a synthetic guidance to suppliers to format data in a suitable way to be treated in the toolbox

The Toolbox CDM shall satisfy a series of requirements:

  • Integrate a CDM framework widely used in Climate and Forecast context
  • Integrate an adequate convention system
  • Bridge CDM concepts with functionalities of toolbox and its structure
  • Integrate guidance to, wherever possible, ensure compatibility with format of main products of the Climate Data Store

The choices of the Toolbox CDM described in this document should hence deal with data types and formats expected to populate the CDS that are briefly described in the following section.

1.1 Expected evolution of the Climate Data Store

The climate data store collect an unique and wide range of Essential Climate Variables from various sources. Essential Climate Variables is a set of 50 variables to describe Earth climate state and evolution and are revised and defined by the GCOS (1). CDS content will evolve rapidly in the next years through step-changes represented by the delivery of the tenders and internal activities dedicated to data production. The CDS core production is addressed in a series of tenders providing the following type data:

Observation collection and processing

Observational gridded products

Global climate reanalyses

Regional climate reanalyses

Multi-model Seasonal forecasts

Regional climate projections

Global climate projections

The Toolbox shall hence be used to access a wide typology of ECVs and data as resumed below, providing a minimal information of type, format and expected availability.

Observations

  • In-situ climate series would include climatic datasets  (such as HadCRUT, GISS, E-OBS) and will cover most ECVs for surface variables (temperature, humidity, winds, precipitation) on regular spatial grids. Variables are reported as anomalies together with means to derive absolute values.
  • Satellite gridded products will cover a large number of ECVs with the plan to start the production in 2017 and for most in 2018. The satellite L3 and L4 will be produced in the framework of C3S using the last version of processing chain developed by the ESA CCI Projects. So a similarity in terms of variables, resolution and time coverage with present ESA CCI products is expected (2).
  • In-Situ data from networks will provide synthetic high-quality ECVs with a lower resolution. C3S311a lot3 will likely integrate the main baseline and reference ground-based and in-situ networks providing a complementary product for Temperature, Ozone, GHGs, Water vapor while lot2 will focus on comprehensive surface datasets.
  • Additional observational products are to be expected, for instance from C3S_311b where a set of fundamental climate data records will be reprocessed. A parallel action of data rescue is done in C3S_311a Lots and new products may enrich the CDS from 2018.

Reanalysis data

  • Atmospheric reanalysis NRT ERA5, ERA-Interim will provide the basis for the early stage CDS content covering a wide range of atmospheric ECVs. Initial release is planned by mid-2017 and full series (from satellite ERA updated quasi-NRT) by mid-2018. Re-analysis are provided on regular spatial and time coordinates. ERA5 will include ensembles.
  • Regional re-analyses are under production for the C3S, with delivery priority on the European domain analogously to the CORDEX Region 4 (3).
  • Additional re-analysis from external providers may also populate the CDS with similar data format and structure of ERA5.
  • Oceanic reanalysis, will be the backbone for currents, salinity, subsurface variables based on the ORAS5 high-resolution ocean and sea-ice reanalysis. ORAS5 is now running in reduced delay mode and is also fully consistent with ERA5, using the HadISST2 SST and the OSTIA sea-ice concentration products. ORAS5 will also be delivered to CMEMS, as an input their multi-reanalysis product.
  • Additional ECVs, pertaining to atmospheric composition, may come from the CAMS Copernicus Service in form of regularly gridded assimilated fields (i.e. ozone).

Seasonal Forecast 

Following the initial release of C3S seasonal forecast products in December 2016, future developments will pursue a wide range of topics with a provision of new products. These can include sophisticated information such forecast of ocean variables, climate indices, advanced forecast of weather regimes and extremes with an increasing complexity in data types provided by the CDS.

Climate Projections

Priority delivery to the CDS is given to existing CMIP5 with focus on data surface or near-surface variables identified as “top priority”. These may include basic ECVs (temperature, wind, precipitation). A specific activity within C3S is currently on-going to provide synthetic products from multi-model and derive specific climate indices. Note that atmospheric fields are defined on specific pressure levels and regular grids while oceanic fields may be defined on irregularly spaced horizontal coordinates.

Regional Climate Projections

The regional projections are object of a specific lot and it is expected to include CORDEX projections. A reasonable definition of the data can be gathered from the CORDEX requirements (3). Concerning the Toolbox CDM, the lot may provide provision of solutions to ensure compatibility between regional simulations rotated grids and the data/grid formats used in the production of global projections. 

Sectorial Information System Indices

C3S is currently developing a series of services dedicated to the development of specific indicators for application sectors. The list and type of indices is highly preliminary and will depends on the implementation of the SECTEUR in the following years. Such indices may include specific ECVs (i.e. FAPAR, Ocean Color) and products derived from observations, reanalyses and projections (i.e. Climate extremes, Drought Indices, Wind gusts and storm tracks).

Based on the above scenario, the CDM toolbox should in principle deal with a vast range and type of data needing in principle general attributes to describe them. The approach chosen here is to define a CDM to ensure maximum compatibility with (1) already defined and (2) prioritized products in the CDS and provide a sufficiently general frame to ease the inclusion of additional datasets. The working hypothesis of this CDM is based on the structure and format of test data, sharing format and content analogies with CDS and starts from: (1) Re-analysis data with ERA-Interim as reference (2) CMIP5 Climate projections (3) ECMWF seasonal forecast products. The CDM will cover the needs for most regularly gridded data expected to populate the CDS as, i.e. the merged satellite products that will be a backbone for the CDS observational ECVs. 

1.2 Pre-existing data formats and data models

The data models, conventions and standards (including de-facto ones) recommended by ECMWF and generally used in Climate data generation are based on the NetCDF4 Data Model (Classic and Enhanced) that is derived from the Unidata’s Common Data Model. A netCDF dataset contains dimensions, variables, and attributes, which all have both a name and an ID number by which they are identified. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The netCDF library allows simultaneous access to multiple NetCDF datasets which are identified by dataset ID numbers, in addition to ordinary file names. Each pre-existing dataset is derived from these data models with substantial differences on conventions on short names and coordinates. As example, CMIP5 Climate Projections adopt a common output (http://cmip-pcmdi.llnl.gov/cmip5/output_req.html) that is implemented through a rewriter (CMOR Climate Model Output Rewriter Software, (17) to ensure full compatibility among data produced by different institutions.

ECMWF, together with meteorological agencies worldwide implements the GRIB format that is mantained at WMO (1). GRIB implements self-description, flexibility and expandability, which are fundamental in times of fast scientific and technical evolution. Within GRIB data are stored in records allowing condensation (packing) to treat large amounts of data and fast access. A large series of tools are available to handle grib. Conversion to NetCDF is also available at ECMWF.

2 Data Format

NetCDF4 is a straightforward choice due to (1) its role as de-facto standard in the climate community addressed by the toolbox and to (2) the expected format of CDS data. This CDM is hence based on the NetCDF Data Model (6, 7) with necessary selection as outlined in this document. Data shall be structured in a proper manner described in Section 3 and contain the necessary information to fully characterize them through a proper metadata definition (Sections 4-8).

Data treated in the toolbox shall follow the widely recognized CF Convention that prescribes a Standard Name Table and a CF model data. CF_convention is implemented in the toolbox unless deviations are required as for instance to treat complex climate indices (Section 5)

3 Data Structure

3.1 Data Fields

In order to address the variety of CDS data up to now, choices have been made to have large commonalities with each prioritized dataset as described in the previous sub-section. These should include Climate projections, Seasonal forecasts, Reanalysis and Observations.

Fields of the same variable, originated from such different sources should be integrated in an uniform way within the toolbox. Toolbox make use of the Xarray package (13) that handles N-dimension arrays based on the NetCDF Data Model.

Data fields are defined as gridded on spatial planetary coordinates. This assumption allows to integrate most of the expected CDS data. Data with different morphologies as single station timeseries can be treated as particular case of the more general gridded data. Gridded data can have diverse spatial coverages with limited domains in horizontal (i.e. European / Regional Reanalyses) and vertical domains (Soil Properties / Ocean / Atmosphere).

Data are treated in the toolbox as variables and shall be decomposed in single data fields. So, data can be a collection of fields from different ranges of: time coordinates, realizations and any of the additional axis defined in Section. The basic assumption is that each DF is independent and contain a single product from any of the ECVs of the CDS.

A DF contains:

  • Domain indexes: each domain axis has a size (integer value greater than zero). The toolbox treats data as hyper-cubes and multiple axis , in addition to spatial coordinates, can be defined.
  • Data array whose shape is been determined by the domain indexes. All elements on the data array must be of the same data type (numeric, char or string). Each element of the array should be filled with a meaningful data OR a standard missing value.

Every data field or collection of data fields should be accompanied by the metadata information providing a complete reference on:

  • Coordinate reference systems relating the field's coordinate values to locations in a planetary reference frame with domain main and auxiliary coordinates to identify the physical meaning and locations of the cells for a unique domain indexes of the field
  • Units for each quantity of the data field(s) and coordinates following the International System
  • Proper naming convention to univocally relate any data to a physical quantity following CF_Conventions framework
  • Cell-methods to describe how the data values represent variation of the same quantity within cells.
  • Additional properties resumed in global attributes which represents metadata about the DF and support provenance tracking. It is assumed that any relevant global attribute is also an attribute of every data variable. Any relation to the dataset should be reported only in the global attributes. 

3.2 Data categories

Data fields are structured on standard categories following the definition used in CMIP5 and ECMWF that provide a general framework for expected data in the CDS as described in section 1.

Surface data includes ECVs at the ground / sea level (i.e. Sea Surface Temperatures, Atmospheric variables, Radiation, Land properties) and are divided in subcategories: 

Surface Data defined at a height level

These variables are representative of a single level (in general close to surface) as for instance meteorological variables (wind, temperatures) in order to clearly disentangle atmospheric variables. Heights (2m, 10m) are standards in meteorology / climate datasets. As stated above, data can be defined on additional axis reporting time and realization) in addition to the spatial axes. We provide below and in the following examples on metadata as shall appear in CDL associated file. Data fields includes a valid time array dimension.

netcdf {
dimensions:
	lat = 128 ;
	lon = 256 ;
	time = UNLIMITED ; // (1 currently)
variables:
double tas(latitude, longitude, time) ; 
tas:standard_name = "air_temperature" ; 
tas:units = "K"; 
tas:coordinates = "height latitude longitude time" ;  
double latitude(latitude) ; 
latitude:standard_name = "latitude" ; 
latitude:units = "degrees_north" ; 
latitude:axis = "Y" ; 
double longitude(longitude) ; 
longitude:standard_name = "longitude" ; 
longitude:units = "degrees_east" ; 
longitude:axis = "X" ; 

Including coordinates as defined in sections 5.1 and 5.2. Height is defined with a fixed value:

double height ; 
height:standard_name = "height"; 
height:units = "m"; 
height:positive = "up"; 
height:axis = "Z";
data: 
height = 2. ;

Soil properties, such as volumetric moisture or temperature defined on specified soil levels. These are treated as variables defined at a height level. 

double sm10(latitude, longitude, time)
sm10:standard_name = "soil_moisture" ; 
sm10:units = "kg / kg";
sm10:coordinates = "depth latitude longitude" ; 
double depth ; 
depth:standard_name = "depth"; 
depth:units = "m"; 
depth:positive = "down"; 
depth:axis = "Z";
// global attributes:
		:Conventions = "CF-1.7" ;
		:history = "2004-09-15 17:04:29 GMT by mars2netcdf-0.92" ;
data: 
depth = 10. ;
}

 
Not defined at a height Level: includes all other instantaneous variables defined at the ground. Example is given for Sea Level Pressure.

double psl(latitude, longitude, time) ; 
psl:standard_name = "air_pressure_at_sea_level" ; 
psl:units = "Pa"; 
psl:coordinates = "latitude longitude time" ;  

Accumulated Fields over time: surface variables accumulated over a defined range of time. Typically in atmosphere such variables include precipitation (snowfall or rainfall) and radiation. These includes coordinates as defined in sections 2.1 and 2.2. 

double tprate(latitude, longitude, time) ; 
tprate:standard_name = "lwe_precipitation_rate" ; 
tprate:units = "m/s"; 
tprate:coordinates = "latitude longitude time" ; 

ADD EXAMPLE ON TIME_BOUNDS OR LEAVE IT TO THE TABLE ?

Height level fields 

ECVs defined at variable atmospheric heights (i.e. dynamical and chemical composition variables such aerosol, GHGs, Ozone) or oceanic depths (i.e. currents, salinity, temperature) includes 3 spatial indexes with a specific altitude coordinate.

double ta(latitude, longitude, height, time) ; 
ta:standard_name = "air_temperature" ; 
ta:units = "K"; 
ta:coordinates = "height latitude longitude time" ; 
double plev(plev) ; 
plev:standard_name = "air_pressure"; 
plev:units = "Pa"; 
plev:positive = "down"; 
plev:axis = "Z"; 

Intransient Fields 

Constant fields over time (i.e. land-sea mask, orography). These fields include latitude and longitude axis.

 

double orography(latitude, longitude) ; 
orography:standard_name = "air_temperature" ;
orography:units = "K"; 
orography:coordinates = "latitude longitude" ; 

 

4. Axis and Coordinates

CF_conventions identifies 4 axis to define position and time (X, Y, Z, T). In addition to these, the Toolbox integrates specific additional axes to: (1) take into account of the ensemble nature of data and (2) integrate statistical operations on data as percentile thresholds (see Section 7). The boundaries between cells are indicated by bounds properties. All coordinates must always explicitly include the units attribute and a prescribed standard_name for proper recognition, independently from axis definition.

4.1 Horizontal coordinates

Horizontal coordinates in the Toolbox are longitude and latitude. These are defined between 0° and 360° and 90°/90° respectively. Longitude and Latitude refers to central point of the grid. Information on the extension of the grid is contained in bounds coordinates (see section 4.5.1) Coordinate system of the Toolbox is preliminary built to associate a coordinate to each axis (i.e. X = longitude, Y=latitude). This implies to deal first with regular horizontal grids. A typical metadata for regular axis would result as follows. 

double latitude(latitude) ; 
latitude:standard_name = "latitude" ; 
latitude:units = "degrees_north" ; 
latitude:axis = "Y" ; 
double longitude(longitude) ; 
longitude:standard_name = "longitude" ; 
longitude:units = "degrees_east" ; 
longitude:axis = "X" ; 

Working hypothesis where coordinates variables are not longitude and latitude as in Oceanic Projections (i.e. NEMO model) or the grid is irregular / rotated (i.e. Regional projections)  is to:

  • Apply a re-gridding procedure to data fields defined on irregulars longitude-latitude grids
  • Supply the definition of longitude and latitude through the coordinate attribute as shown below
double x (x); 
x:long_name = "i-index of mesh grid"; 
x:units = "1"; 
x:axis = "X" ; 
double y (y) ;  
y:long_name = "j-index of mesh grid" ; 
y:units = "1" ; 
y:axis = "Y" ; 
double latitude (y, x) ; 
latitude:standard_name = "latitude" ; 
latitude:units = "degrees_north" ; 
double longitude(y, x) ; 
longitude:standard_name = "longitude" ; 
longitude:units = "degrees_east" ;

4.2 Vertical coordinates

Vertical coordinates contains information for height (for atmospheric variables) or depth (for oceanic ones). Soil depth is considered as separated fixed level fields. For atmospheric fields we make use of pressure coordinates. Pressure is oriented from ground to atmosphere top and is expressed in standard units (Pa). It is not prescribed to make use of the same values for height axis. Toolbox implements interpolation system to report data on the same coordinate values. A positive attribute with a value of up or down (case insensitive) should be indicated. Axis is Z.

double plev(plev);
plev:standard_name = "air_pressure" ; 
plev:units = "Pa" ; 
plev:positive = "down" ; 
plev:axis = "Z" ; 

For depth the vertical type may be indicated providing the standard_name attribute with an appropriate value and the axis attribute with the value Z.

double depth(depth)
depth:standard_name = "depth_below_geoid" ; 
depth:units = "m" ; 
depth:positive = "down" ; 
depth:axis = "Z" ; 

4.3 Time Coordinates

Time coordinates should take into account the structure of seasonal forecasts that are a main component of the CDS. In general, forecasts are characterized by a reference time, i.e. the time of the analysis from which the forecast was made, a valid time, i.e. the time represented by the forecast and period, i.e. the interval between the forecast reference time and the validity time. 

Within the toolbox a time coordinate is used to contain:

  • Time value for instantaneous fields.
  • First time value for accumulated or averaged fields (in the format yyyy:mm:dd:00:00) .
  • Valid instantaneous time of the forecast fields.

Time is defined in hours since a defined date. 

double time; 
time:standard_name = "time"; 
time:long_name = "valid time";
time:units = "hours since 2016-10-26T00:00:00Z"; 
time:calendar = "gregorian"; 
time:axis = "T"; 

Leadtime coordinate is analogous to the forecast_reference_time and is the initial time of the forecast expressed in the same units of time.

double leadtime; 
leadtime:standard_name = "leadtime"; 
leadtime:long_name = "hours since forecast_reference_time";
leadtime:units = "hours since 2016-10-26T00:00:00Z"; 
leadtime:calendar = "gregorian"; 

The forecast_period corresponding to the length of the forecast can be derived as the difference between time and leadtime.

The units attribute takes a string value formatted as per the recommendations in the Udunits package (12). Toolbox uses the days units since years / months should be used with caution due to different calendar and year definition (year is 365 days, a leap_year is 366 days, a Julian_year is 365.25 days, and a Gregorian_year is 365.2425 days). So, in order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. The Gregorian calendar is prescribed to avoid conversions and additional use of the calendar attribute. Axis for time is T.

4.4 Realization

Realization is used to label a dimension that can be thought of as a statistical sample, e.g., labelling members of a model ensemble. CDS will in fact provide databases including multiple realizations (ensembles). Typical examples are climate projections, seasonal forecast and new generation reanalyses as ERA5. The ensemble nature of CDS database is taken into account with the a specific coordinate. Realization is defined as an integer identifying different ensemble members and is seen as a discrete coordinate. A sepcific axis E is attributed to realization coordinate. 

integer realization;
realization:units = "1" ;
realization:standard_name = "realization" ;
realization:long_name = "Number of the simulation in the ensemble" ;
realization:units = "1" ;
realization:axis = "E";

4.5 Additional axis

4.6 Bound coordinates

When data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume", a complete description of the variable should include coordinates that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent. To represent cells additional coordinates identifying the domain extent of each cell are added. Bounds contains the vertices of the cell boundaries. A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Bounds are defined for the spatial and time coordinates defined in a continous space.

4.6.1 Time Bounds

2D variable identifying the time boundaries for which each variable is representative, expressed in the same units of time. It is necessary to univocally identify variables accumulated and/or averaged over a defined time interval (i.e. rainfall, daily / monthly temperatures). This should be done through the definition of time_bounds coordinate. All variables representatives of a finite interval of time (i.e. not instantaneous) should hence be associated with a time_bounds coordinate reporting the initial and final time of the interval (i.e. monthly mean of Jan 2011 time_bounds=[201101010000, 201101312359]. See also section 5 for accumulation and mean treatment in the CDM.

4.6.2 Spatial coordinates bounds

2D variables identifying the longitude/latitude/height boundaries for which each variable is representative, expressed in the same units of the corresponding coordinate. 

latitude:bounds = "latitude_bounds" ;
double latitude_bounds(latitude, bounds) ; 
longitude:bounds = "longitude_bounds" ; 
double longitude_bounds(longitude, bounds) ; 

5. Naming Conventions

5.1 Standard names

Standard names should follow CF_conventions and univocally identify the variable contained in each field. See CF_conventions for a full list of standard_names. Standard name is a definitive description of the quantity, which would allow users of data from different sources to determine whether quantities were in fact comparable.

Standard names by themselves are not always sufficient to describe a quantity once spatial or temporal operations have been applied or the data may represent an uncertainty in the measurement of a quantity. Modifications due to common statistical operations are expressed via the cell_methods attribute (Section 5.1). standard_name within the toolbox shall not be modified and information should be retrieved from methods attribute and additional systems of provenance (see Section . Quantity modifiers are expressed using the optional modifier part of the standard_name attribute. Modifiers are given in Appendix C, Standard Name Modifiers

5.2 Long names

long_name is intended to describe variables including operations (i.e. 2m temperature monthly mean) and are intended to be used for data rescue and/or labeling. These can be user defined and attribute is not mandatory.

5.3 Short names

short_name is the intended as the field identification used in the toolbox data manipulation. These are defined from the ones used in the existing datasets, to make them more recognizable to users. Preference is accorded on the conventions used in CMIP5, Seasonal Forecast (that gathers conventions from CMIP5) and partly from ERA-Interim data. The ToolBox implements conversion functions where short names are different from the ones defined in the CDM (i.e. CMIP5 CMOR convention and Re-analysis). Definition of short names is reported in the tables.

5.4 File naming

Single data filename generated in the toolbox should not carry semantics on the typology of data. All information should be included in the metadata and global attributes. A typical example is the treatment of ensemble elements in CMIP5 that are contained in different data files identified with different names while this CDM prescribes the use of the realization coordinate.

6. Units

Toolbox implements widely used units used in CF_Conventions as generally used in the vast majority of datasets (see Annex 2). Toolbox implements the UDUNITS package (12) for arithmetic manipulation and conversion of units. 

7. Track operations on data

The toolbox is specifically designed to implement operations on data fields and a proper definition of derived fields is hence necessary. These operations may be not included / properly described in CF conventions (i.e. thresholds, quantiles) and discussions on how to implement them in a proper way are still on-going due to the lack of a shared definition. As basic choice the CDM prescribes the adoption of cell_methods as included in CF_conventions to track basic operations on a single fields and working hypothesis to treat additional derived variables and operations involving more than one fields.

7.1 Use of cell_methods attribute

To describe the characteristic of a field the cell_methods attribute is associated to the variable. This is a string attribute comprising a list of blank-separated words of the form "name: method". Each "name: method" pair indicates that for an coordinate identified by name, the cell values representing the field have been determined or derived by the specified method. For example, if data values have been generated by computing time means, then this could be indicated with cell_methods="t: mean", assuming here that the name of the time coordinate is t. By default, the statistical method indicated by cell_methods is assumed to have been evaluated over the entire horizontal area of the cell.

CF_conventions includes a series method(s) to be applied reported below (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch07s03.html).

Method
Unit
Operation
sum
u
Sum or Accumulation
maximum
u
Maximum
median
u
Median
mid_range
u
Average of maximum and minimum
minimum
u
Minimum
mean
u
Mean (average value)
mode
u
Mode (most common value)
range
u
Absolute difference between maximum and minimum
standard_deviation
u
Standard deviation
variance
u2
Variance

 A full list of examples on the implementation of cell_method(s) is provided on http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch07s03.html. We list below basic examples on most common operation on climatic data.

7.1.1 Subsequent operations

If more than one cell method is to be indicated, they should be arranged in the order they were applied. The left-most operation is assumed to have been applied first: cell_methods="time: mean lon: maximum". If a data value is representative of variation over a combination of axes, a single method should be prefixed by the names of all the dimensions involved. For instance, the standard deviation of a field within a longitude-latitude gridbox could have cell_methods="lat: lon: standard_deviation" . To indicate more precisely how the cell method was applied, extra information may be included in parentheses () after the identification of the method. For example, the standard deviation of daily values could be indicated by cell_methods="time: standard_deviation (interval: 1 day)".

7.1.2 Operations leading to reduction of dimensionality

A large set of operations results in "collapsing" an axis, for instance by calculating a variance from time series data or applying a zonal mean. We strongly recommend that dimensions of size is retained with a scalar coordinate variable to enable documentation of the method through the cell_methods attribute and its domain through the coordinate bounds attribute.

7.1.3 Time mean, maximum and minimum

A climatological coordinate may use different statistical methods to represent variation among years, within years and within days that cannot be resumed in time bounds. For example, the average January temperature in a climatology is obtained by averaging both within years and over years and univocal bounds cannot be defined. This is also different from the average January-maximum temperature and the maximum January-average temperature that imply the application of a maximum identification before performing a multi-annual mean. For the former, we first calculate the maximum temperature in each January, then average these maxima; for the latter, we first calculate the average temperature in each January, then find the largest one. As usual, the statistical operations are recorded in the cell_methods attribute, which may have two or three entries for the climatological time dimension. For instance: cell_methods=”time: max within days time: mean over days” 

7.1.4 Climatological Statistics

Climatological statistics may be derived from corresponding portions of the annual cycle over different of years, e.g., the average winter temperatures 1979-2016, or by averaging the 30 January from the separate years. As prescribed in CF_conventions (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#climatological-statistics) climatological variables can have a climatological time axis with a variable dimension to include for instance mean daily cycle (24 elements), mean annual cycles (12 elements). A full description is available in section 7.4 of CF_conventions 1.7 guide. For clarity, an example on minimum seasonal temperatures (MAM, JJA, SON, DJF) for the 1961-1990 years is provided below.

The mean and minima operations are listed in the cell_methods

dimensions: time=4; bounds=2; variables:
float temperature(time,latitude,longitude); temperature:long_name="surface air temperature";
temperature:standard_name="air_temperature";
temperature:cell_methods="time: minimum within years time: mean over years"; temperature:units="K";

A climatological axis is then defined. Time refers to the central time of the mean operation. Bounds are used to constrain the interval, reporting initial and final year and month

double time(time); time:climatology="climatology_bounds"; time:units="days since 1960-1-1";
double climatology_bounds(time,bounds);
data: // time coordinates translated to date/time format time="1990-4-16", "1990-7-16", "1990-9-16", "1991-1-16" ; climatology_bounds="1960-3-1", "1990-5-31", "1960-6-1", "1990-8-31", "1960-9-1", "1990-11-30", "1960-12-1", "1991-2-28" ;

7.2 Operations not included in methods

A wide range of derived climate indices requires specific treatment that is not solved or are not properly addressed in CF_conventions (i.e. see discussions in CLIP-C). Typical examples are thresholds and quantiles. Ways to treat them includes multiple options, such as coding a standard name following prescribed modifiers (i.e. number_of_days_with_air_temperature_above_threshold), re-define specific units, include additional axis.

First working hypothesis for toolbox, that is intended to access, plot and perform operations on properly described data is to:

  • Deal with complex data operations in the internal provevance system based on global attributes (see section 7)
  • Implements the use of additional axis to treat quantiles, that are not included up to now in CF-Conventions

8 Track missing data

We implement the netCDF convention (NUG appendix B) to provide tracking of missing data through _FillValue, missing_value, valid_min, valid_max, and valid_range attributes to indicate missing and valid range for data.

The missing values of a variable with scale_factor and/or add_offset attributes are interpreted relative to the variable’s packed values (the raw values, the values stored in the netCDF file) not the values that result after the scale and offset are applied. Applications that process variables that have attributes to indicate both a transformation (via a scale and/or offset) and missing values should first check that a data value is valid, and then apply the transformation. Note that values that are identified as missing should not be transformed. Since the missing value is outside the valid range it is possible that applying a transformation to it could result in an invalid operation. For example, the default _FillValue is very close to the maximum representable value of IEEE single precision floats, and multiplying it by 100 produces an "Infinity" (using single precision arithmetic).

Coordinate reference system values must no have values representing missing or invalid data. The valid_range attributes could be used as possible range for coordinates values. These are attributed to a single variable as:

 

double msl(time, latitude, longitude) ;
msl:_FillValue = -32767s ;
msl:missing_value = -32767s ;
msl:units = "Pa" ;
msl:long_name = "Mean sea level pressure" ;

9 Global attributes

The following properties are intended to provide information about data provenance at large: (1) track the origin of the data and (2) actions applied to them. The attribute values are all character strings. The toolbox will integrate and deal with global attributes of the data output of the toolbox workflow. Specific attributes will be used within the toolbox to:

  • implement an internal tracking
  • identify data typology in graphical interfaces and application
  • reference tools / methodologies used in workflows

Conventions

Aim: identify the convention for data format. Will be a string (i.e. C.F-1.6).

Title

Aim: describe the dataset / diagnostics with a simple sentence. This may be displayed in the Toolbox, as in discovery systems, in the list from a search or from the option. Therefore should be human readable and reasonable to display in a list of such names.

References

Aim: provide references to the data / methodologies applied. Preference is given to the use of DoI both for article reference and for referenced dataset. In absence of a DoI one or more references in Nature style referencing formats may be used. For instance: Amerine, M.A.; Winkler, A.J. (1944). "Composition and quality of musts and wines of California grapes". Hilgardia15: 493–675.

Source

Aim: track the ensemble of data used in the referenced product and used in the plots. Names should be short and univocally identify the data. Default concatenates the sources of single data used for the output (i.e. for a combination of reanalysis and model output ERA-INTERIM - EC-EARTH_V* - ...). Should follow a proper syntax. Users are advised to customize content to properly implement in labels fro plotting (i.e. add "Difference between"  ERA-INTERIM - EC-EARTH_V* -)

Institution

Aim: report the signature of the Institution generating the data or implementing a specific workflow. Should be a recognisable name with associated URL

Contact

Aim: report the contact point for the data generation. Should be an active website / e-mail.

Project

Aim: report the name of the project where data have been generated. Can be free text.

Creation Date

Aim: report the date of creation of the dataset / product. Format is: YYYY-MM-DDThh:mm:ss<zone>

Comment

Aim: report any other information not included in the above attributes. Should be written in free text with concise syntax

History

Aim: record relevant information and track provenance which led to the output file/product

The "history" attribute provides an audit trail for modifications to the original data. It should contain a separate line for each modification with each line including a timestamp, user name,  modification name, and modification arguments. Its use is recommended and its value will be used by THREDDS as a history-type documentation. The "history" attribute is recommended by the NetCDF Users Guide and the CF convention.

Versioning of the software used to create the data can be included here and/or in the commit attribute (second choice). 

Lineage

Aim: similar to history is designed to trace back the operations made on the data and hence provide the information on of the tools/scripts used.

May be used in alternative / complement to History. Further guidance should be consolidated based on the implementation of provenance.

Summary

Aim: provide a human readable description of the content; intended as a verbose extension of title / history ...

Keywords

Aim: provide a vocabulary to define taxonomy of the product and retrieve it on keyword-based search. Should follow a defined semantics making use of GCMD : http://gcmd.gsfc.nasa.gov/learn/keywords.html - defined by user.

License

Aim: report any other information not included in the above attributes. Should be written following classical format for licencing references.

Global attributes shall be reported in CDL file as:

// global attributes:
:Conventions = "CF-1.7" ;
:Title = "Example of metadata for Global Attributes" ;
:References = "Amerine, M.A.; Winkler, A.J. (1944). "Composition and quality of musts and wines of California grapes". Hilgardia. 15: 493–675." ;
:Source ="Difference between ERA-INTERIM - EC-EARTH_V" ;
:Institution = "Copernicus Climate Change Service" ;
:Contact = "www.c3s.com" ;
:Project = "C3S_25 Toolbox" ;
:Creation Date = "YYYY-MM-DDThh:mm:ss<zone>" ;
:Comment = "" ;
:History = "" ;
:Lineage = "" ;
:Summary = "" ;
:Keywords = "" ;
:License = "" ;

Tables

An operative summary to provide guidance on global attributes and data within the toolbox is reported in a series of tables. Data tables addresses the definitions / names / attributes / coordinates of fields described in the previous sections including a grib code to ease correspondence with grib2 format used at ECMWF. Tables are structured on the categories of data fields of section 8. Single tables are not exhaustive of the ECVs that will likely be present in the CDS and, at the present stage, focus on widely used variables for surface and atmosphere. Tables will hence be further integrated with additional ECVs.

Table 1: Global attributes

Finalised

Attribute Name

Value

Examples

Comments

YConventionsCF convention string  [Other convention] :..."CF-1.6"
"CF-1.6 C3S-0.1"

Multiple conventions may be included (separated by blank spaces)

Ytitle

Free Text

"ERA-INTERIM reanalysis"

"Winkler index"

"Hurricane track"

Concise name: to be used in data access

YreferencesDOI or Nature style reference"doi:10.5194/gmd-8-1509-2015"


Ysourcetext following syntax

For composite data: "EC-Earth-RCP8.5 : ERA-Interim : HadCru"

Implement a concat of data names used to produce results

Yinstitution

free text / website

"Institute for Atmospheric Sciences and Climate"

www.isac.cnr.it


Ycontact

text website or e-mail


"http://copernicus-support.ecmwf.int"

Would consider Copernicus as principal contact: URL should be used

Yproject

free text

"Copernicus C3S_25"


Ycreation_date

SPECS: YYYY-MM-DDThh:mm:ss<zone>


"2011-06-24T02:53:46Z"

NOTE: This is ISO 8601:2004 extended format

YcommentFree texti.e. "Analysis of Winkler index to perform a test use case"
YhistoryText formatted


Provides an audit trail for modifications to the original data. This attribute is also in the NetCDF Users Guide: 'This is a character array with a line for each invocation of a program that has modified the dataset. Well-behaved generic netCDF applications should append a line containing: date, time of day, user name, program name and command arguments.' To include a more complete description you can append a reference to an ISO Lineage entity; see NOAA EDM ISO Lineage guidance

N

lineage

Free text (ISO Lineage model 19115-2)


Ysummarytext
A short paragraph describing the dataset / workflow
Ykeywords

text following vocabulary

i.e. "Temperature GDD Agriculture projections historical ..."

Keywords should follow a common vocabulary and semantics. Make use of GCMD

http://gcmd.gsfc.nasa.gov/learn/keywords.html

Nlicense


Table 2: Coordinates

FinalisedCoordinate NameDimension NameDescriptionAxisBoundsDirectionStandard_nameLong_nameValid min/maxCalendarUnitsPositiveTypeComments
Ylonlon
Xlon_boundsincreasinglongitudelongitude0/360N/Adegrees_east


Ylatlat
Ylat_boundsincreasinglatitudelatitude

-90/90

N/Adegrees_northnorth

Yboundsbounds

N/AN/Aboundsbounds

1


Ytimetime
Ttime_bounds
timetime
Gregorian

example: "hours since dd/mm/yyyy:00:00:00"




Yleadtimetime
TN/A
forecast_reference_timeFirst time of the forecast
Gregorian

example: "hours since dd/mm/yyyy:00:00:00"





Yplevplev
ZN/Adecreasingair_pressurepressure
N/APadown

Yheightheight
ZN/Aincreasingheightheight
N/Amup

Ydepthdepth
ZN/Aincreasingdepthdepth
N/Amdown

Yrealizationrealization
EN/Aincreasingrealizationrealization
N/A1N/A

Table 3: Coordinates bounds

FinalisedCoordinate NameDimension NameDescriptionAxisBoundsDirectionStandard_nameLong_nameValid min/maxCalendarUnitsPositiveTypeComments
Nlon_boundslon_bounds
XTBDincreasinglon_boundslon_bounds0/360N/Adegrees_east

2D vector Not already tested in Toolbox
Nlat_boundslat_bounds
YTBDincreasinglat_boundslat_bounds

-90/90

N/Adegrees_northnorth
2D vector Not already tested in Toolbox
Ntime_boundstime_bounds
 TTBDN/Atime_boundstime_bounds
Gregorian

example: "hours since dd/mm/yyyy:00:00:00"


double

2D vector Not already tested in Toolbox

Table 4: Surface Fields (defined at a given height level)

Finalised

Variable_name

Standard_name

Short_nameGrib ECMWF Parameter*

Description

Units

Cell_Methods

Height

PositiveDimensionsTime_boundsTypeComments
Y2m temperatureair_temperaturetas167

Instantaneous


K
2

longitude latitude height time


real


N2m temperature monthly meanair_temperaturetasmm167monthly meanKtime:mean2

longitude latitude height time

defined with interval 1 month hrs Starting at 0Z timebounds = [20160100 00:00, 20160131 23:59] - real


Ymax 2m temperature 24 hair_temperaturetasmax51

instantaneous

24h

Ktime:max2
longitude latitude height timedefined with interval 1 month hrs Starting at 0Z timebounds = [20160100 00:00, 20160131 23:59] - real
Ymin 2m temperature 24 hair_temperaturetasmin52

 instantaneous

24h

Ktime:min2
longitude latitude height timedefined with interval 1 month hrs Starting at 0Z timebounds = [20160100 00:00, 20160131 23:59] - real
N2m dewpoint temperaturedew_point_temperatureTBD168instantaneousK
2
longitude latitude height time
real
Y10 m U wind componenteastward_winduas165instantaneousm/s
10
longitude latitude height time
real
Y10 m V wind componentnorthward_windvas166instantaneousm/s
10
longitude latitude height time
real
Y10 max wind gustwind_speedmaxw49

instantaneous

24h

m/s
10
longitude latitude height time
real

Note: How to add here the ODB features (varno / flags) ?

Table 5: Surface Fields (not defined at a given height level)

Finalised

Variable_name

Standard_name

Short_nameGrib ECMWF Parameter*

Description

Units

Cell_Methods

Height

PositiveDimensionsTime_boundsType
Ymean sea level pressureair_pressure_at_sea_levelpsl151

Instantaneous


Pa
N/A

longitude latitude time


real
Ytotal cloud covercloud_area_fraction_assuming_maximum_random_overlapclt164Instantaneous1
N/A

longitude latitude time


real
Yskin temperaturesurface_temperaturets235

instantaneous

K
N/A
longitude latitude time
real
Ysea ice coversea_ice_area_fractionsic31

 instantaneous

1
N/A
longitude latitude time
real
Ysea surface temperatureopen_sea_surface_temperaturetos34instantaneousK
N/A
longitude latitude time
real

Table 6: Accumulation Fields

Finalised

Variable_name

Standard_name

Short_nameGrib ECMWF Parameter*

Description

Units

Cell_Methods

Height

PositiveDimensionsTime_boundsTypeComments
Ytotal precipitationlwe_precipitation_amounttp228

Valid for the time_bounds interval

mtime:sum n/a

longitude latitude time

defined with interval 24 hrs Starting at 0Z

real


Ytotal precipitation ratelwe_precipitation_ratetprate172.228Valid for the time_bounds intervalm/s
n/a
longitude latitude time


 

real
Ysnowfall (convective+stratiform)lwe_snowfall_amountsf144Valid for the time_bounds intervalm (water equivalent)time:sumn/a
longitude latitude timedefined with interval 24 hrs Starting at 0Zreal
Ysnowfall rate (convective + stratiform)lwe_snowfall_ratesfrate172144Valid for the time_bounds intervalm (water equivalent) /s
n/a
longitude latitude time
real

Table 7: Pressure / Height level fields

Finalised

Variable_name

Standard_name

Short_nameGrib ECMWF Parameter*

Description

Units

Cell_Methods

PositiveDimensionsTime_boundsTypeComments
Ytemperatureair_temperatureta130

Instantaneous


K

longitude latitude height time


real


Ygeopotentialgeopotentialz129Instantaneousm2/s2

longitude latitude height time


real


Yspecific humidityspecific_humidityhus133

instantaneous

kg/kg

longitude latitude height time
real


YU wind componenteastward_windua131instantaneousm/s

longitude latitude height time
real


YV wind componentnorthward_windva132instantaneousm/s

longitude latitude height time
real


Table 8: Intransient Fields

Finalised

Variable_name

Standard_name

Short_nameGrib ECMWF Parameter*

Description

Units

Cell_Methods

Height

PositiveDimensionsTime_boundsTypeComments
Ygeopotentialsurface_altitude
129

n/a

mN/A N/A

longitude latitude

N/A

real


Yland sea maskland_area_fraction
177n/a0-1N/AN/A

longitude latitude

N/Aint

List of Acronyms

ACDD

ACSG                        CEOS WGCV Atmospheric Composition Sub Group

C3S                           Copernicus Climate Change Service

CAMS                       Copernicus Atmosphere Monitoring Service

ESA-CCI                   European Space Agency’s Climate Change Initiative

CDM                         Common Data Model

CDR                          Climate Data Record

CDS                          Climate Data Store of the C3S Service

CF                             Climate and Forecast conventions

CLIPC                       Climate Information Platform for Copernicus (EU FP7 project)

ECV                           Essential Climate Variable

EO                             Earth Observation

FAPAR                      Fraction of Absorbed Photosynthetically Active Radiation

GCOS                        Global Climate Observing System

WMO                         World Meteoroloigcal Organization

Bibliography and links


(1) Definition of Essential Climate Variables WMO/GCOS

(2) ESA Climate Change Initiative data http://cci.esa.int/

(3) Cordex variables requirements http://is-enes-data.github.io/CORDEX_variables_requirement_table.pdf

(4) Unidata Common data model http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/ 

(6) NetCDF https://www.unidata.ucar.edu/software/netcdf/

(7) NETCDF4 Common data model http://www.unidata.ucar.edu/software/netcdf/workshops/2008/netcdf4/Nc4DataModel.html

(8) CF conventions general definitions http://cfconventions.org 

(9) Standard name tables from CF_Conventions http://cfconventions.org/Data/cf-standard-names/37/build/cf-standard-name-table.html

(10) ACDD convention for data discovery http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery 

(11) CF aggregation rules proposal, http://cf-trac.llnl.gov/trac/ticket/78 

(12) UDUNITS Package http://www.unidata.ucar.edu/packages/udunits/ 

(13) Xarray package http://xarray.pydata.org/en/stable/

(14) CMIP5 standard output http://cmip-pcmdi.llnl.gov/cmip5/docs/standard_output.pdf

(15) Cordex Intitiative www.cordex.org

(16) Grib general description http://www.wmo.int/pages/prog/www/WMOCodes/Guides/GRIB/Introduction_GRIB1-GRIB2.pdf

(17) CMOR Climate Model Output Rewriter  http://cmip-pcmdi.llnl.gov/cmip5/output_req.html







1 Comment

  1. The information provided here is no longer up to date and may confuses users #archive