Hi,
my question concerns the definition of the grib keys 'validityDate' and 'step' for monthly statistics of seasonal forecast data. In particular, the values for the grib keys 'validityDate' and 'verifyingMonth' in the monthly statistics to me seem to be inconsistent with each other. Take the following example request to get monthly forecasts of 2-m temperature from the UKMO model initialized on 1 July 2022 (nominal start date):
import cdsapi c = cdsapi.Client() c.retrieve( 'seasonal-monthly-single-levels', { 'format': 'grib', 'originating_centre': 'ukmo', 'variable': '2m_temperature', 'product_type': 'monthly_mean', 'year': '2022', 'month': '07', 'leadtime_month': [ '1', '2', '3', '4', '5', '6', ], 'grid': [1.0,1.0], 'area': [89.5,0.5,-89.5,359.5] }, 't2m_2022_07_ukmo_.grib' )
Inspecting the grib keys of 't2m_2022_07_ukmo_.grib' for the different ensemble members (as suggested here)
grib_ls -p origin,type,shortName,number,indexingDate,indexingTime,dataDate,dataTime,validityDate,fcmonth,verifyingMonth t2m_2022_07_ukmo_.grib
shows that 'validityDate' and 'verifyingMonth' are different. While 'validityDate' (to my mind) suggests that there is no data for July and the first 60 members are valid for August, 'verifyingMonth' indicates that the first 60 ensemble members are valid for July (which makes more sense to me).
ecCodes seems to handle this information correctly when using grib_to_netcdf (i.e., the first step in the resulting netcdf file is July 2022) but when opening the grib file with cdo or python's xarray (cfgrib backend) the data gets a valid date coordinate that I assume it derives from 'validityDate' (or the actual addition of 'step' to 'dataDate'), i.e., no valid data in July.
Am I correct in assuming that the first forecast step in the downloaded file above is the monthly mean temperature for July? If so, what is the actual defintion of the grib keywords 'validityDate' (and related to this the meaning of 'step') for the monthly statistics? Is it (always) the end date of the monthly aggregation window (which would be the only way I could make sense of it here)? Is this specified somewhere?
Thanks a lot for your help!
ole
3 Comments
Eduardo Penabad
Hi Ole!
as you rightly guessed
validityDate
is computed from the start date/time (i.e.dataDate
) plus the value ofstep.
In data with a very long lead time as it happens with these seasonal forecasts, the encoding in GRIB1 has some limitations which usually make users' life a bit complicated, and this is indeed one of them.For these monthly aggregations the value of
step
points to the end of the aggregation interval (calendar month) which happens to be at 00Z of the 1st day of the following month, causingvalidityDate
seemingly point to the wrong month (M+1). As you said, lots of the most used tools out there will get confused with that by using that value.There are a couple of ways to circumvent those inconveniences:
verifyingMonth
with the format YYYYMM contains the aggregation month for which the data is relevant (July/2022 in your example)forecastMonth
is an integer index with a value of 1 for the first complete calendar month after (and including) the start date: in your exampleforecastMonth=1
for adataDate=20220701
points to July (=2 to August, etc). In your example with MetOffice data you will see members starting in June/2022 (as encoded indataDate
), for those members they would have something likedataDate=20220625
and henceforecastMonth=1
points to the first complete calendar month after that, which is JulyThe second one, using python+xarray+cfgrib would imply using the backend kwargs
time_dims
to specify which time dimensions you would like to use in your xarray.Dataset (instead of the default onestime
andvalidityDate
), it would be something like the following:I hope that sounds helpful. Regards,
Eduardo Penabad
C3S Climate Predictions and Projections Team
Ole Wulff
Hi Eduardo,
this is really helpful, thanks a lot for taking the time to respond! Nice solution with xarray, too, I wasn't aware that you could point to different grib headers to assign different dimensions.
Thanks again!
Best,
ole
Eduardo Penabad
Hi Ole!
As you might have seen, there was a typo in the python code ('time_dims' must be in quotes), I have now edited that.
Thanks to my colleague Kevin Marsh for spotting it.