Hi dear all!
I am sorry for my post, since I found that some thing similar has been published here (ERA5: how to calculate daily total precipitation:ERA5: How to calculate daily total precipitation).

Although it seems that the solution has been explained in that post, I would like to be sure if  I am really doing the things (accumulating ERA5 precipitation for each day of the year) correctly.

After retrieving the data (Jan 1, 1979 to Dec 31, 2018) from CDS, I have in the input file 24 houly steps (0, 1, 2, 3, 4,..,23) for each calendar day starting from Jan 1 to Dec 31 of each considered year.

As far as I know  and from the post  mentioned (ERA5: how to calculate daily total precipitation:ERA5: How to calculate daily total precipitation) the accumulated precipitation for Jan 1, 1979 is obtained by summing the steps 1, 2,...,23 of Jan 1 and step 0 of Jan 2. It means that the step 0 of Jan 1, 1979 is not included in calculation of the total precipitation for that day.  For calculation of total precipitation for Jan 2, 1979 we use also the steps 1, 2, 3,...,23 of that day plus step 0 of Jan 3 and so on.

I am doing all the operations through CDO operators as ilustrated bellow:
cdo -f nc copy  out.nc aux.nc
cdo -delete,timestep=1, aux.nc aux1.nc
cdo -b 32 timselsum,24 aux1.nc aux2.nc
cdo -expr,'ppt=tp*1000' -setmissval,-9999.9 -remapbil,r240x120 aux2.nc era5_ppt_prev-0_1979-2018.nc

The problem raises for Dec 31 of last year (2018) in may sample as I have not retrieved the step 0 of Jan 1, 2019 which should be used in calculation of total precipitation for Dec 31 according to explanation above.  

In this regards, i started wondering if I am doing correctly or no. After googling for many time, I found the above mentioned ECMWF post (ERA5: how to calculate daily total precipitation:https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation). Of course I tried to run the code as is explained in the blog, but it suggests that deffinitively I should have the step 0 of Jan 1, 2019 for my case.

So I have four questions:
1. Is it correct what I am doing?
2. Is it correct exluding the step 0 of Jan 1, 1979 from the calculation of total precipitation for that day (Jan 1, 1979)?
3. Is it correct that for precipitation of Dec 31, 2018 I need to retrieve the step of Jan 1, 2019? 
4. Is it aplied for other variables, such wind at any pressure levels?

Thank all of you in advance for your guidance and quickly replay.

Sincerely, 

KCS

25 Comments

  1. Hi Kenedy,


    I was having a similar task recently so I can answer some of your questions.

    1. I don't use cdo, so I can't help you answering this
    2. Yes it is correct to exclude step 0 from calculation for any day, so for 1 Jan 1979 too
    3. Yes it is correct that for any day you need step zero from next day (because step zero contains data from step 23 to 00), that includes 31 Dec 2018
    4. This is applied for variables that have so called accumulated, mean and min/max values (precipitation, radiation, wind gust, minimum temperature, maximum temperature..). You can find more about this here.

    This is how I calculated daily precipitation using Python and xarray library.

    import xarray as xr                                                    # import xarray library
    ds_nc = xr.open_dataset('name_of_your_file.nc')                        # read the file
    daily_precipitation = ds_nc.tp.resample(time='24H').sum('time')*1000   # calculate sum with frequency of 24h and multiply by 1000
    daily_precipitation.to_netcdf('daily_prec.nc')                         # save as netCDF

    This code assumes that:

    • your precipitation variable is called tp
    • your time dimension is called time
    • your data starts on step 1 on first day and ends at step 0 on the day+1 of data you need calculation (actually it doesn't matter when it starts, because it will just sum every 24 hrs no matter when it starts, but if you want it to be correct, it needs to start on step 1)

    I hope this helps.

    Milana

    1. Hi Milana Vuckovic,

      Thank you for your guidance. It is now more clear for me.

      Best regards,

      KCS

    2. Dear Milana Vuckovic

      I tested your code in the python. Your code didn't assume that data starts on step 1 on the first day and ends at step 0 on the day+1! But Your code runs through 00:00:00 and ends at 23:00:00 the same day. It seems that you wanted to consider 01:00:00 until 00:00:00 of the next day, just like me.

      I believe that nor "24h" neither "D" works correctly to overcome ERA5 since the post-processing issue!
      Recently I thought about it on Matlab language, I believe the only way to solve hourly to daily in ERA5 since the post-processing issue is to delete the first-time step from each NetCDF then using something like you mentioned. However many academic persons like me ain't just need one simple year but they require long term data like 1989 to 2019 so I recommend to first download all NetCDF of years you want in all 00:00 to 23:00 hours then delete first time step in the first NetCDF, afterward all first-time steps of other years should split and merge with previous year!
      I did it in Matlab language and I tested it and this method is worked well.
      It is very complicated to achieve this goal through coding.
      Best Regards,
      Behzad

  2. Hi!

    I'm beginner using ERA 5 land hourly data and python code. 

    I leverage these comments to post my questions:


    1) Does the results change in followings codes? ('D' instead '24H')

          A:

    import xarray as xr
    hourly_precipitation = xr.open_dataset('file.nc')
    tp = hourly_precipitation['tp'] 
    daily_precipitation = tp.resample(time='D').sum(dim='time')

       

          B:

    import xarray as xr
    hourly_precipitation = xr.open_dataset('file.nc')
    tp = hourly_precipitation['tp'] 
    daily_precipitation = tp.resample(time='24H').sum(dim='time')


    2) Given that new data are expressed in daily values and in mm. How can the file conserve its data attributes (units, name)?


    What I got following the codes from Milana Vuckovic

    input :    import xarray as xr
               hourly_precipitation = xr.open_dataset('file.nc')
               tp = hourly_precipitation['tp'] 
               daily_precipitation = tp.resample(time='24H').sum(dim='time')
               print(daily_precipitation)
     
    output:    <xarray.DataArray 'tp' (time: 31, latitude: 191, longitude: 141)>
    		   array([[[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		7.9321235e-02, 7.6868758e-02, 7.7845931e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		8.3974749e-02, 7.6535061e-02, 7.3284402e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		8.3442643e-02, 7.9983965e-02, 7.8586042e-02],
            		...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		4.5421839e-02, 4.2529181e-02, 4.3187052e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		4.5267045e-02, 4.0618449e-02, 4.1576222e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		4.3898135e-02, 3.8548127e-02, 3.9738059e-02]],
    
          		    [[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
    		         5.1719919e-02, 4.8580587e-02, 4.5160592e-02],
    		        [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		6.1065495e-02, 5.9130579e-02, 5.6847379e-02],
    		        [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 6.7349166e-02, 6.5781817e-02, 6.4291969e-02],
    		        ...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
    		         4.4880062e-02, 4.1493967e-02, 3.7512943e-02],
    		        [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 6.0165778e-02, 5.9846491e-02, 5.4772273e-02],
    		        [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
    		         6.5680236e-02, 6.6652536e-02, 6.0088322e-02]],
    
    		       [[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		1.8528122e-01, 1.9772749e-01, 2.0224547e-01],
          		   [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		2.1084128e-01, 2.1377268e-01, 2.0102166e-01],
          		   [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		2.1412092e-01, 2.0911922e-01, 1.9180670e-01],
            		...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
          	   		8.3384588e-02, 8.1086874e-02, 7.9084277e-02],
         		   [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
         		    6.3648641e-02, 6.3334197e-02, 6.5341651e-02],
       		  	   [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
          		   5.0766990e-02, 5.1279768e-02, 5.5759087e-02]],
    
         		    ...,
    
           			[[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 6.0664043e-02, 5.5551067e-02, 5.8138996e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 5.8506593e-02, 5.1492602e-02, 5.2155301e-02],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 5.9382200e-02, 5.2479386e-02, 5.2015021e-02],
           			 ...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.7946064e-03, 1.6978830e-03, 1.6156286e-03],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.8333495e-03, 1.5140921e-03, 1.4415383e-03],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
           			  2.1138787e-03, 1.7559230e-03, 1.6930401e-03]],
    
          			 [[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 4.7390541e-01, 5.7894140e-01, 5.4980671e-01],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 3.8995010e-01, 5.0930434e-01, 5.0355291e-01],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 3.4993637e-01, 4.6482104e-01, 4.7122559e-01],
            		...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		3.5314262e-04, 2.0320714e-04, 1.3546646e-04],
           			 [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             		2.8543174e-04, 2.4671853e-04, 1.9352138e-04],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 3.0961633e-04, 3.9184093e-04, 3.2895803e-04]],
    
          			 [[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.2949327e-01, 1.3300508e-01, 1.3834059e-01],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.3654593e-01, 1.3127336e-01, 1.3447079e-01],
           			 [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.2847741e-01, 1.2261949e-01, 1.2644096e-01],
            		...,
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.0642409e-04, 1.5489757e-04, 7.3529780e-04],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 9.1910362e-05, 1.4518201e-04, 6.0470402e-04],
            		[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
            		 1.5485287e-04, 1.9355118e-04, 5.8528781e-04]]], dtype=float32)
    			Coordinates:
     			* time       (time) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-01-31
      			* longitude  (longitude) float32 -82.0 -81.9 -81.8 -81.7 ... -68.2 -68.1 -68.0
      			* latitude   (latitude) float32 0.0 -0.1 -0.2 -0.3 ... -18.7 -18.8 -18.9 -19.0


    An example how it would ideally be:

    input :    print(precipitation['tp'])
    
    output:    <xarray.DataArray 'tp' (time: 96, latitude: 191, longitude: 141)>
               [2585376 values with dtype=float32]
               Coordinates:
               * longitude  (longitude) float32 -82.0 -81.9 -81.8 -81.7 ... -68.2 -68.1 -68.0
               * latitude   (latitude) float32 0.0 -0.1 -0.2 -0.3 ... -18.7 -18.8 -18.9 -19.0
               * time       (time) datetime64[ns] 2017-01-01 ... 2017-01-04T23:00:00
               Attributes:
               units:      m
               long_name:  Total precipitation


    Both clearly differs. After seeing this, I start to doubt if the operation through the code posted above give correct results 


    3) I tried to replicate for daily temperature (instead of precipitation) using mean (instead of sum), and this was what I obtained:


    input :     import xarray as xr
    			hourly_temperature = xr.open_dataset('era5-hourly-2m_temperature-enero2017.nc')
    			t2m = hourly_temperature['t2m'] 
    			daily_temperature = tp.resample(time='24H').mean(dim='time')
    			print(daily_temperature)
    
    output:    C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\nanops.py:140: RuntimeWarning: Mean of empty slice
               return np.nanmean(a, axis=axis, dtype=dtype)
    
    		   <xarray.DataArray 'tp' (time: 31, latitude: 191, longitude: 141)>
    		   array([[[           nan,            nan,            nan, ...,
         	   3.30505148e-03, 3.20286490e-03, 3.24358046e-03],
           	   [           nan,            nan,            nan, ...,
           	   3.49894795e-03, 3.18896095e-03, 3.05351685e-03],
          	   [           nan,            nan,            nan, ...,
          	   3.47677688e-03, 3.33266519e-03, 3.27441841e-03],
           	   ...,
           	   [           nan,            nan,            nan, ...,
               1.89257658e-03, 1.77204923e-03, 1.79946050e-03],
               [           nan,            nan,            nan, ...,
           	   1.88612693e-03, 1.69243535e-03, 1.73234253e-03],
           	   [           nan,            nan,            nan, ...,
               1.82908901e-03, 1.60617195e-03, 1.65575242e-03]],
    
           	   [[           nan,            nan,            nan, ...,
          	   2.15499662e-03, 2.02419120e-03, 1.88169128e-03],
          	   [           nan,            nan,            nan, ...,
           	   2.54439563e-03, 2.46377406e-03, 2.36864085e-03],
           	   [           nan,            nan,            nan, ...,
               2.80621531e-03, 2.74090911e-03, 2.67883204e-03],
               ...,
               [           nan,            nan,            nan, ...,
               1.87000260e-03, 1.72891526e-03, 1.56303926e-03],
               [           nan,            nan,            nan, ...,
               2.50690733e-03, 2.49360385e-03, 2.28217803e-03],
               [           nan,            nan,            nan, ...,
               2.73667648e-03, 2.77718902e-03, 2.50368007e-03]],
    
               [[           nan,            nan,            nan, ...,
               7.72005087e-03, 8.23864527e-03, 8.42689443e-03],
               [           nan,            nan,            nan, ...,
               8.78505316e-03, 8.90719518e-03, 8.37590266e-03],
               [           nan,            nan,            nan, ...,
               8.92170519e-03, 8.71330034e-03, 7.99194630e-03],
               ...,
              [           nan,            nan,            nan, ...,
              3.47435777e-03, 3.37861967e-03, 3.29517829e-03],
              [           nan,            nan,            nan, ...,
              2.65202671e-03, 2.63892487e-03, 2.72256881e-03],
              [           nan,            nan,            nan, ...,
              2.11529131e-03, 2.13665701e-03, 2.32329522e-03]],
    
              ...,
    
              [[           nan,            nan,            nan, ...,
              2.52766837e-03, 2.31462787e-03, 2.42245826e-03],
              [           nan,            nan,            nan, ...,
              2.43777479e-03, 2.14552507e-03, 2.17313762e-03],
              [           nan,            nan,            nan, ...,
              2.47425842e-03, 2.18664110e-03, 2.16729264e-03],
              ...,
              [           nan,            nan,            nan, ...,
              7.47752711e-05, 7.07451254e-05, 6.73178583e-05],
              [           nan,            nan,            nan, ...,
              7.63895587e-05, 6.30871727e-05, 6.00640960e-05],
              [           nan,            nan,            nan, ...,
              8.80782827e-05, 7.31634573e-05, 7.05433413e-05]],
    
              [[           nan,            nan,            nan, ...,
              1.97460596e-02, 2.41225585e-02, 2.29086131e-02],
              [           nan,            nan,            nan, ...,
              1.62479207e-02, 2.12210137e-02, 2.09813714e-02],
              [           nan,            nan,            nan, ...,
              1.45806819e-02, 1.93675440e-02, 1.96343996e-02],
              ...,
              [           nan,            nan,            nan, ...,
              1.47142755e-05, 8.46696366e-06, 5.64443553e-06],
              [           nan,            nan,            nan, ...,
              1.18929893e-05, 1.02799386e-05, 8.06339085e-06],
              [           nan,            nan,            nan, ...,
              1.29006803e-05, 1.63267050e-05, 1.37065845e-05]],
    
              [[           nan,            nan,            nan, ...,
              5.39555261e-03, 5.54187829e-03, 5.76419150e-03],
              [           nan,            nan,            nan, ...,
              5.68941375e-03, 5.46972314e-03, 5.60294976e-03],
              [           nan,            nan,            nan, ...,
              5.35322540e-03, 5.10914531e-03, 5.26837306e-03],
              ...,
              [           nan,            nan,            nan, ...,
              4.43433737e-06, 6.45406544e-06, 3.06374095e-05],
              [           nan,            nan,            nan, ...,
              3.82959843e-06, 6.04925071e-06, 2.51960009e-05],
              [           nan,            nan,            nan, ...,
              6.45220280e-06, 8.06463231e-06, 2.43869927e-05]]], dtype=float32)
    		Coordinates:
     		* time       (time) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-01-31
      		* longitude  (longitude) float32 -82.0 -81.9 -81.8 -81.7 ... -68.2 -68.1 -68.0
      		* latitude   (latitude) float32 0.0 -0.1 -0.2 -0.3 ... -18.7 -18.8 -18.9 -19.0
    


    I don't know why "RuntimeWarning: Mean of empty slice" appears. What could I do in order to correct?


    Sorry, I know I posted many questions and codes.

    I hope you can give a hand with this.


    Thanks in advance

  3. Hi Brian,

    Looking at your code above, you have:

    daily_temperature = tp.resample(time = '24H' ).mean(dim = 'time' )

    which means you are sampling the 'tp' (total precipitation variable).

    I think the line should be:

    daily_temperature = t2m.resample(time='24H').mean(dim='time')


    Thanks,
    Kevin




  4. Hi Brian,

    1) My guess is that "D" considers calendar days whereas "24H" will consider slices of 24h starting from the first time encountered. If your data starts at "00:00" they should be equivalent but I have not tested this. I would recommend to use "D".

    2) It is not clear from your question what differences are of concern to you. If it is the absence of attributes? xarray does not preserve attribute when it performs an operation, it just drops them. The CDS Toolbox has implemented a range of functions based on xarray and handling attributes. Have a look at this application which does the daily aggregates you are looking into. https://cds.climate.copernicus.eu/toolbox-editor/168/daily_resample (you will need to be logged in the CDS to access the application and code).

    3) I am not sure what your input data looks like but it sounds like one of the "24h" slices is empty.


  5. Hi Brian,

    To add to previous answers, in original question and my answer to that, the problem were precipitation at 00 hour in ERA5 (not ERA5 Land), because they belong to the previous day and should be calculated there. Using 'D' will consider the data with the same day, no matter how many there are. But it will add the data from the 00 time to the wrong sum. If the data from time 00 on first day of data are removed, then doing the sum from time 1 using '24H' will calculate correct daily sum. What happens if some data is missing, I was not testing.

    From your comment it seems that you are using ERA5 Land data set. Total precipitations are stored differently there than in ERA5. Every hour has accumulated precipitation from the beginning of the day. So to get the daily precipitation all you need is to retrieve the data at the 00 time. It will, however have accumulated precipitation for the  previous day.

    As for temperature. It is instantaneous parameter, not accumulated, so there is no 00 time problem, so it is better to use 'D' frequency.

    I hope this helps.

    Cheers,

    Milana

  6. Hi Brian,

    You can achieve the behaviour Milana is mentioning by specifying how to close the interval:

    daily_precipitation = tp.resample(time='D', closed='right').sum(dim='time')

    closed='right' means that the right most value ("00:00" of the next day) will be included in the sum over the previous day.

    For example if you have hourly data for January 2019, the data for '2019-01-01-00:00' will contribute to the sum of '2018-12-31' while the sum over '2019-01-01' will take the values from '2019-01-01-01:00' all the way to '2019-01-02-00:00' included.

    You can do this in the Toolbox too with:

    ct.cube.resample(total_prec, freq='D', how='sum', closed='right')
    1. I learned something new today (smile)  Thanks!

      1. No problem, happy it helped someone (smile)

        One thing to be aware of with close='right' is that for the example of January 2019 your resampled output data will start with the date "2018-12-31" for which the sum will be wrong because it only considers the "2019-01-01-00:00" value. You could drop the first element of your output to avoid errors down the line. Or you can drop the first value from the hourly data as your where doing when using "24H".

    2. Hi Vivien MAVEL,

      thanks for the hint with closed='right'. What I still don't understand, let's say if someone has the hourly tp for one day in one single file and would like to resample to daily values. Does xarray opens automatically the file of the following day or where does xarray get the 00 hour of following day from? Or does in this case, if the hourly daily are in daily files, the first hour will just be dropped?

      Thanks,
      Julia

      1. Hi Julia,

        I am not sure I fully understood your question.

        Xarray will only consider the time steps included in the dataarray your are resampling. If your data is spread over various files / dataarrays / data object you should first concatenate them and then resample the whole lot. Xarray will only get the 00 hour if it's available, if it is not it will just proceed with all the available times for the given day.

        Does this answer your question?

        Thanks.

        Vivien

        1. That answers my question. How is it in the toolbox - let's say we retrieve data until 31 Dec 2015 23:00.
          If we want to have the same behaviour, we would need to retrieve the data for 1 Jan 2016 00 as well?

          1. That's right and it is a good thing to point out.

            You could for example do a specific retrieve for that specific time step and concatenate it to the rest of the data if you do not want to download a whole extra month or year.

  7. Thanks for all your previous comments.


    I tested codes posted by Vivien MAVEL and I still have some doubts:


    1)  Vivien MAVEL, I did not understand completely your last post about "to be aware"?, could you explain it again please? 


    2) For precipitation case, I employed theses codes:

    import xarray as xr
    hourly_precipitation = xr.open_dataset('era5-hourly-total_precipitation-january2017.nc')
    tp = hourly_precipitation['tp'] 
    daily_precipitation = tp.resample(time='D', closed='right').sum(dim='time')
    print(daily_precipitation)

     I obtained:

    <xarray.DataArray 'tp' (time: 32, latitude: 191, longitude: 141)>
    array([[[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             9.3552470e-03, 9.5100403e-03, 1.0767728e-02],
            [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             9.0166330e-03, 9.2875212e-03, 1.0342047e-02],
            [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             9.4568282e-03, 9.9018514e-03, 1.0637119e-02],
            ...,
            ...
            ...
            ...,
            [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             0.0000000e+00, 8.2328916e-05, 6.7724288e-04],
            [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             0.0000000e+00, 4.8428774e-05, 5.2246451e-04],
            [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
             4.3585896e-05, 4.3585896e-05, 4.5467913e-04]]], dtype=float32)
    Coordinates:
      * time       (time) datetime64[ns] 2016-12-31 2017-01-01 ... 2017-01-31
      * longitude  (longitude) float32 -82.0 -81.9 -81.8 -81.7 ... -68.2 -68.1 -68.0
      * latitude   (latitude) float32 0.0 -0.1 -0.2 -0.3 ... -18.7 -18.8 -18.9 -19.0


    If you look down on the last code you can view coordinates attributes. My question is, why does time coordinate consider "2016-12-31" if I only downloaded data from January 2017?


    3) For temperature case, I employed these codes:

    import xarray as xr
    hourly_temperature = xr.open_dataset('era5-hourly-2m_temperature-january2017.nc')
    t2m = hourly_temperature['t2m'] 
    daily_temperature = tp.resample(time='D').mean(dim='time')
    print(daily_temperature)


    I obtained:

    C:\Users\kb.yallev\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\nanops.py:140: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis=axis, dtype=dtype)
    
    <xarray.DataArray 'tp' (time: 31, latitude: 191, longitude: 141)>
    array([[[           nan,            nan,            nan, ...,
             3.30505148e-03, 3.20286490e-03, 3.24358046e-03],
            [           nan,            nan,            nan, ...,
             3.49894795e-03, 3.18896095e-03, 3.05351685e-03],
            [           nan,            nan,            nan, ...,
             3.47677688e-03, 3.33266519e-03, 3.27441841e-03],
            ...,
            ...
    		...
            ...
            ...,
            [           nan,            nan,            nan, ...,
             4.43433737e-06, 6.45406544e-06, 3.06374095e-05],
            [           nan,            nan,            nan, ...,
             3.82959843e-06, 6.04925071e-06, 2.51960009e-05],
            [           nan,            nan,            nan, ...,
             6.45220280e-06, 8.06463231e-06, 2.43869927e-05]]], dtype=float32)
    Coordinates:
      * time       (time) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-01-31
      * longitude  (longitude) float32 -82.0 -81.9 -81.8 -81.7 ... -68.2 -68.1 -68.0
      * latitude   (latitude) float32 0.0 -0.1 -0.2 -0.3 ... -18.7 -18.8 -18.9 -19.0


    The warning

    RuntimeWarning: Mean of empty slice return np.nanmean(a, axis=axis, dtype=dtype)

    stills appears. How can I fix it? I do not know if this warning could really cause any further problem later. 

  8. Hi Brian,

    Point 1) and 2) are actually related. What I was trying to explain is that when you set closed='right' you close your slicing interval on the right side. For a day the slicing interval goes from "00:00" to "00:00" the next day and by closing the interval to the right "2017-01-01-00:00" belongs to the slice of the day before i.e. "2016-12-31-00:00". This is why you get a data point for "2016-12-31" in you output even though you only use data for January 2017 in input. If this an issue you can drop the first timestep of the input data before resampling as Milana suggested.

    For 3) this is a more generic error and I would break down the input data to find out where the empty slice is. From the error I understand that in the resampling process one of the slices  has no data. Is there a chance there are gaps in the data? One day without data? 

  9. Hi again!

    Back to the last problem I found:

    import xarray as xr
    hourly_temperature = xr.open_dataset('era5-hourly-2m_temperature-month.nc')
    t2m = hourly_temperature['t2m'] 
    daily_temperature = t2m.resample(time='D').mean(dim='time')
    print(daily_temperature)

    I followed Vivien MAVELcomment about the possibility of gap in data. I tried to check if this possibility occurs only in January data. So, I downloaded data en two groups: first group contains data only for February, and second group contains data for March and April. All of them for 2017.

    After running the code above (changing the file for each of these new two groups downloaded data), the issues persists:

    RuntimeWarning: Mean of empty slice return np.nanmean(a, axis=axis, dtype=dtype still appears

    What comes to my mind are 2 ideas:

    • "2m temperature" variable in ERA 5 land hourly data has information gaps. This idea does not really convince me.
    • the code "t2m.resample(time='D').mean(dim='time')" may have some flaw: the lack or excess of codes, or more precise code. I suspect this probably is the main problem.

    If one of you figure out this issue, please be free to share with us.

  10. Hi Brian,

    Sorry to hear the warning persists.

    I doubt the ERA 5 land data has gaps but someone else might help on that here.

    If you think the xarray function has an issue I recommend to use more general purpose python / xarray forum.

    Finally I recommend you try and use the Toolbox for these basic operations. I will be able to help more on that, you can start with this example: https://cds.climate.copernicus.eu/toolbox-editor/168/daily_resample

    Cheers.

    Vivien

  11. Hi Brian,

    Try to plot one day of your data and you will notice that ERA5 Land  dataset only has data over land and the data over oceans and seas is all Nan. That is why you have the warning about empty slice.

    Try to select only small area that doesn't have sea and you will see that the warning will disappear.

    Cheers,

    Milana

    1. Thanks Milana, I didn't think about the spatial distribution, that must be it!

  12. Vivien MAVEL Milana Vuckovicthanks, I really appreciate your answers.

    As Milana Vuckoviccommented, ERA5 land data contains information just for land regions.

    The data I use was downloaded considering a goegraphic coordinates covering some ocean zones. I employ a western South America area, particularly which geographic coordinates are -83 (west), -23 (south), -67 (east), and 3 (north).

    Tentative explanation of  RuntimeWarning: Mean of empty slice return np.nanmean(a, axis=axis, dtype=dtype) can be the missing values for ocean zones. In order to be sure of it, it is neccesary to test it. 

    It is suggested two options: 1) try with a land area exclusively or 2) graph the data. 

    Regarding second option, I explored in different forums trying to know how to plot an specific area (using netCDF file downloaded from ERA 5 land)  but so far I did not find a way to do it. If you know how to do it, it will help me a lot!


  13. Hi Brian,

    This is how you select smaller area using xarray: 
    small_area = era5_land.sel(latitude=slice(-10,-20),longitude=slice(10,20))    Note that latitudes are going from bigger to smaller because they decrease in your dataset too.

    If you need the data over the oceans maybe you should better use ERA5 instead of ERA5 Land.

    Before explaining the RuntimeWarning I have tested it both on ERA5 and ERA5 Land and on smaller area, and I got the warning with ERA5 Land which then disappeared when I selected smaller area. Your daily mean will be calculated fine, it is just telling you that you don't have the data over the ocean which you already know.

  14. Milana Vuckovic I just tried what you said in small area (covering only land region), and it works. The "RuntimeWarning" does not appear anymore. Thanks

  15. For a CDO-based solution, the relevant command is shifttime, which essentially does what is says on the can and shifts the time stamp.

    This kind of problem arises frequently with any kind of flux or accumulated field where the timestep points to the END of the time "window", for example, with 3 hourly TRMM data the last three hours of the day have the stamp of 00 on the date afterwards.

    So for ERA5 precip where you have a long series of hourly data from ERA5, and the window from 23:00 to 00:00 has the timestamp of 00:00 from the day after, and you want the daily average, you can do:

    cdo shifttime,-1hour in.nc shift.nc # now step 0 on Jan 2 has Jan 1, 23:00 stamp 
    cdo daymean shift.nc daymean.nc 

    or piped together:

    cdo daymean -shifttime,-1hour in.nc daymean.nc


  16. Hi, Adrian Mark Tompkins I have applied your code and got the following result.


     cdo sinfo d2m_wb.nc
    File format : NetCDF2
    -1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter ID
    1 : unknown unknown v instant 1 1 475 1 F64 : -1
    Grid coordinates :
    1 : lonlat : points=475 (19x25)
    lon : 85.5 to 90 by 0.25 degrees_east
    lat : 21.5 to 27.5 by 0.25 degrees_north
    Vertical coordinates :
    1 : surface : levels=1
    Time coordinate : 25904 steps
    RefTime = 1900-01-01 00:00:00 Units = hours Calendar = gregorian Bounds = true
    YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
    1949-12-31 23:00:00 1950-01-01 11:00:00 1950-01-02 11:00:00 1950-01-03 11:00:00
    1950-01-04 11:00:00 1950-01-05 11:00:00 1950-01-06 11:00:00 1950-01-07 11:00:00
    1950-01-08 11:00:00 1950-01-09 11:00:00 1950-01-10 11:00:00 1950-01-11 11:00:00
    1950-01-12 11:00:00 1950-01-13 11:00:00 1950-01-14 11:00:00 1950-01-15 11:00:00
    1950-01-16 11:00:00 1950-01-17 11:00:00 1950-01-18 11:00:00 1950-01-19 11:00:00
    1950-01-20 11:00:00 1950-01-21 11:00:00 1950-01-22 11:00:00 1950-01-23 11:00:00
    1950-01-24 11:00:00 1950-01-25 11:00:00 1950-01-26 11:00:00 1950-01-27 11:00:00
    1950-01-28 11:00:00 1950-01-29 11:00:00 1950-01-30 11:00:00 1950-01-31 11:00:00
    1950-02-01 11:00:00 1950-02-02 11:00:00 1950-02-03 11:00:00 1950-02-04 11:00:00
    1950-02-05 11:00:00 1950-02-06 11:00:00 1950-02-07 11:00:00 1950-02-08 11:00:00
    1950-02-09 11:00:00 1950-02-10 11:00:00 1950-02-11 11:00:00 1950-02-12 11:00:00
    1950-02-13 11:00:00 1950-02-14 11:00:00 1950-02-15 11:00:00 1950-02-16 11:00:00
    1950-02-17 11:00:00 1950-02-18 11:00:00 1950-02-19 11:00:00 1950-02-20 11:00:00
    1950-02-21 11:00:00 1950-02-22 11:00:00 1950-02-23 11:00:00 1950-02-24 11:00:00
    1950-02-25 11:00:00 1950-02-26 11:00:00 1950-02-27 11:00:00 1950-02-28 11:00:00
    ................................................................................
    ................................................................................
    ................................................................................
    .................
    2020-10-03 11:00:00 2020-10-04 11:00:00 2020-10-05 11:00:00 2020-10-06 11:00:00
    2020-10-07 11:00:00 2020-10-08 11:00:00 2020-10-09 11:00:00 2020-10-10 11:00:00
    2020-10-11 11:00:00 2020-10-12 11:00:00 2020-10-13 11:00:00 2020-10-14 11:00:00
    2020-10-15 11:00:00 2020-10-16 11:00:00 2020-10-17 11:00:00 2020-10-18 11:00:00
    2020-10-19 11:00:00 2020-10-20 11:00:00 2020-10-21 11:00:00 2020-10-22 11:00:00
    2020-10-23 11:00:00 2020-10-24 11:00:00 2020-10-25 11:00:00 2020-10-26 11:00:00
    2020-10-27 11:00:00 2020-10-28 11:00:00 2020-10-29 11:00:00 2020-10-30 11:00:00
    2020-10-31 11:00:00 2020-11-01 11:00:00 2020-11-02 11:00:00 2020-11-03 11:00:00
    2020-11-04 11:00:00 2020-11-05 11:00:00 2020-11-06 11:00:00 2020-11-07 11:00:00
    2020-11-08 11:00:00 2020-11-09 11:00:00 2020-11-10 11:00:00 2020-11-11 11:00:00
    2020-11-12 11:00:00 2020-11-13 11:00:00 2020-11-14 11:00:00 2020-11-15 11:00:00
    2020-11-16 11:00:00 2020-11-17 11:00:00 2020-11-18 11:00:00 2020-11-19 11:00:00
    2020-11-20 11:00:00 2020-11-21 11:00:00 2020-11-22 11:00:00 2020-11-23 11:00:00
    2020-11-24 11:00:00 2020-11-25 11:00:00 2020-11-26 11:00:00 2020-11-27 11:00:00
    2020-11-28 11:00:00 2020-11-29 11:00:00 2020-11-30 11:00:00 2020-12-31 23:00:00
    cdo sinfo: Processed 1 variable over 25904 timesteps [6.03s 37MB


    In this case, the timestep shows 11:00:00. I guess it should be 12:00:00. What wrong I've done here? Any suggestion? Thank you.