I try to create March daily climatology from ERA5 hourly data. However the "dayofyear" is always equal to 32 and not 31. In NCL I can shift the time value due to averaging process, but I don't know how to do it in CDS. Is there any way to solve my issue? 

Thank you,


data1 = ct.catalogue.retrieve(
'reanalysis-era5-single-levels',
{
'variable': 'total_precipitation',
'product_type': 'reanalysis',
'year': list(range(1979, 2020 + 1)),
'month': [
'03',
],
'day': [
'01', '02', '03', '04', '05', '06',
'07', '08', '09', '10', '11', '12',
'13', '14', '15', '16', '17', '18',
'19', '20', '21', '22', '23', '24',
'25', '26', '27', '28', '29', '30',
'31'
],
'time': [
'00:00', '03:00', '06:00',
'09:00', '12:00', '15:00',
'18:00', '21:00',
],
'area': [
15, 89, -19,
156,
],
'grid': [0.25, 0.25],
}
)


daily_clim1 = ct.climate.climatology_mean(
data = data1,
start = '1980',
stop = '2020',
frequency = 'dayofyear',
# closed='right'
)print(daily_clim1)

<xarray.DataArray 'tprate_climatology_mean' (dayofyear: 32, lat: 137, lon: 269)>
[1179296 values with dtype=float32]
Coordinates:
    realization  int64 ...
  * lat          (lat) float64 -19.0 -18.75 -18.5 -18.25 ... 14.5 14.75 15.0
  * lon          (lon) float64 89.0 89.25 89.5 89.75 ... 155.2 155.5 155.8 156.0
  * dayofyear    (dayofyear) int64 60 61 62 63 64 65 66 ... 85 86 87 88 89 90 91
Attributes:
    long_name:              Total precipitation rate climatological mean
    units:                  m s-1
    standard_name:          lwe_precipitation_rate
    cds_magics_style_name:  precipitation-rate
    type:                   real
    GLOBAL_Conventions:     CF-1.7
    GLOBAL_institution:     European Centre for Medium-Range Weather Forecasts
    GLOBAL_history:         2021-07-18T04:41 GRIB to CDM+CF via cfgrib-0.9.9....
    GLOBAL_source:          ECMWF


3 Comments

  1. Why don't you use the "Daily statistics calculated from ERA5 data" Application in the Climate Data Store?

    You can download the data from there or use the source code which you can adapt to your needs.

  2. Because I want to create a daily climatology and not only daily mean. The link only limits to single layer. I think if possible to shift the time stamp in CDS it will be very useful.

    In NCL I can do this: precip&time = data_clim1&time- tointeger(step_data/2), but for this I need to download the data. If possible, I prefer CDS since it can reduce the amount of data before downloading... 

    If there is any clue to correct March climatology in order to have dayofyear=31, it will be helpful.

    Thank you in advance




  3. Hi, Aditya:

    Have you solved this problem?. I have been facing it too, recently, while working with the toolbox and I have found some revealing information by myself and in other forums too.

    As a side note, I have to mention that the toolbox is based on the xarray library (among others). You cannot import the xarray library (neither any other one) from your script in the toolbox, but a lot of  its functionality is exposed to the user through the cdstoolbox library.

    I have made some tests by calculating the "climatology" for two years (this is not really a climatology, that should comprise a lot more years, but I did it just for testing). The two years are 1980 (leap year) and 1981.

    It results that the 29th of february of 1980 (leap year) is averaged with the 1st of march of 1981 (non leap year), and so on. So, in a leap year, all days after 29th February are averaged with the following calendar day of the non-leap years. The resulting climatology has 366 days, the last one being the average of the 31th December of every four years (the leap years).

    So, I think that, for your problem, you have the day number 60 with the average of the 1st March of only non-leap years. For the days from 61 to 90, the calendar days of non-leap years are averaged with the previous calendar day of leap years, and the day number 90 is the average of the 31th March of only the leap years. So for this last day you have not the climatology over 41 years, but, approximately, over 41/4=10 years.

    Thinking as a meteorologist, I think that it is not an appreciable error to average calendar days that only differ in one or two days. And perhaps it is not an error at all. We can think that, in consecutive years, the solar cycle is shifted 1/4 of a day for the same calendar day. After four years, the cycle is shifted one day and a leap year is introduced to compensate. But, meanwhile, we are averaging days that are influenced by a solar cycle that is shifted 1/4 of a day. I think that this 1/4 of day is not important. That is, that the 15th January 1981 (non leap) can be regarded as having the same climatology as the 15th January 1982 (non leap), id est, the same calendar days of different years are climatologically equivalent. And in this reasoning line, the 15th January 1981 (non leap) has the same climatology as the 14th January 1982 (non leap), id est, consecutive calendar days of different years are also climatologically equivalent. So that the averaging of calendar days after 29th February of non-leap years with the preceding calendar day of leap years, is not a problem. But if you are calculating a daily climatology over 30 days, the day number 366 is calculated only over 30/4=7 years. I think that this is indeed a problem. But with a simple solution: it can be solved by simply erasing the day number 336 (I have not tried this).

    But I also think that being a purist (and I think I am), all calendar days should be averaged only with the same calendar day of all years. So, In my opinion, or at least for some applications, it could be possible to simply drop all 29th February. This is easily done with the xarray library by means of:

    dataset = dataset.convert_calendar('365_day')

    The problem is that I have not found this xarray functionality exposed by the cdstoolbox library. Perhaps someone of the EC could help on this point.

    Another solution (also by using the xarray library) I have found at the Github forum about the xarray library:

    https://github.com/pydata/xarray/issues/1844

    It consist in reindexing the dayofyear so that the day 366 is always 31th December of all leap and not leap years. But dayofyear=60 will contain only the average of the (every four years) 29th February. But I think that this is the same problem as having the dayofyear=366 with the average of every four years.