Hi,

my question concerns the definition of the grib keys 'validityDate' and 'step' for monthly statistics of seasonal forecast data. In particular, the values for the grib keys 'validityDate' and 'verifyingMonth' in the monthly statistics to me seem to be inconsistent with each other. Take the following example request to get monthly forecasts of 2-m temperature from the UKMO model initialized on 1 July 2022 (nominal start date):

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'format': 'grib',
        'originating_centre': 'ukmo',
        'variable': '2m_temperature',
        'product_type': 'monthly_mean',
        'year': '2022',
        'month': '07',
        'leadtime_month': [
            '1', '2', '3',
            '4', '5', '6',
        ],
        'grid': [1.0,1.0],
        'area': [89.5,0.5,-89.5,359.5]
    },
    't2m_2022_07_ukmo_.grib'
)


Inspecting the grib keys of 't2m_2022_07_ukmo_.grib' for the different ensemble members (as suggested here)

grib_ls -p origin,type,shortName,number,indexingDate,indexingTime,dataDate,dataTime,validityDate,fcmonth,verifyingMonth t2m_2022_07_ukmo_.grib


shows that 'validityDate' and 'verifyingMonth' are different. While 'validityDate' (to my mind) suggests that there is no data for July and the first 60 members are valid for August, 'verifyingMonth' indicates that the first 60 ensemble members are valid for July (which makes more sense to me).

ecCodes seems to handle this information correctly when using grib_to_netcdf (i.e., the first step in the resulting netcdf file is July 2022) but when opening the grib file with cdo or python's xarray (cfgrib backend) the data gets a valid date coordinate that I assume it derives from 'validityDate' (or the actual addition of 'step' to 'dataDate'), i.e., no valid data in July.

Am I correct in assuming that the first forecast step in the downloaded file above is the monthly mean temperature for July? If so, what is the actual defintion of the grib keywords 'validityDate' (and related to this the meaning of 'step') for the monthly statistics? Is it (always) the end date of the monthly aggregation window (which would be the only way I could make sense of it here)? Is this specified somewhere?

Thanks a lot for your help!

ole

3 Comments

  1. Hi Ole!

    as you rightly guessed validityDate is computed from the start date/time (i.e. dataDate ) plus the value of step. In data with a very long lead time as it happens with these seasonal forecasts, the encoding in GRIB1 has some limitations which usually make users' life a bit complicated, and this is indeed one of them.

    For these monthly aggregations the value of step  points to the end of the aggregation interval (calendar month) which happens to be at 00Z of the 1st day of the following month, causing validityDate  seemingly point to the wrong month (M+1). As you said, lots of the most used tools out there will get confused with that by using that value.

    There are a couple of ways to circumvent those inconveniences:

    • The first one, in the purely GRIB world, is to rely on the pieces of metadata (GRIB headers) which are included in these files to properly annotate the data (see for instance https://apps.ecmwf.int/codes/grib/format/grib1/local/16/ ). And those are the following:
      • verifyingMonth with the format YYYYMM contains the aggregation month for which the data is relevant (July/2022 in your example)
      • forecastMonth is an integer index with a value of 1 for the first complete calendar month after (and including) the start date: in your example forecastMonth=1  for a dataDate=20220701  points to July (=2 to August, etc). In your example with MetOffice data you will see members starting in June/2022 (as encoded in dataDate ), for those members they would have something like dataDate=20220625  and hence forecastMonth=1 points to the first complete calendar month after that, which is July
    • The second one, using python+xarray+cfgrib would imply using the backend kwargs time_dims to specify which time dimensions you would like to use in your xarray.Dataset (instead of the default ones time and validityDate ), it would be something like the following:

      ds = xr.open_dataset('file.grib',engine='cfgrib',backend_kwargs={'time_dims':('verifying_time', 'indexing_time')} )

    I hope that sounds helpful. Regards,


    Eduardo Penabad
    C3S Climate Predictions and Projections Team

  2. Hi Eduardo,

    this is really helpful, thanks a lot for taking the time to respond! Nice solution with xarray, too, I wasn't aware that you could point to different grib headers to assign different dimensions.

    Thanks again!

    Best,

    ole

    1. Hi Ole!

      As you might have seen, there was a typo in the python code ('time_dims' must be in quotes), I have now edited that.
      Thanks to my colleague Kevin Marsh for spotting it.