Hi (or maybe, hi again)! 

Now that the cache problem is solved, it seems I have other troubles 😅 

I am requesting data from the Agrometeorological indicators from 1979 to present derived from reanalysis throught the Python cdsapi package. As recommended by the CDS efficiency tips, I programmed for loops month per month, so that all the weather variables I want to retrieve do not exceed the 100 elements limit that exists for this dataset. I retrieve the wanted data for the whole month, then go to another until all I want is downloaded.

However, I still get the "request has too many items" when I shouldn't. For example, I have the too many items error for the following request, over the 31 days of January 2021:

Humidity request
cds = cdsapi.Client()
years = ['2021']
months = ['01']
days = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']

## RELATIVE HUMIDITY 6AM, 9AM
cds.retrieve('sis-agrometeorological-indicators', {
        "variable": "2m_relative_humidity", 
            "year": years,
            "month": months,
            "day" : days,
            'time': [
                '06_00', '09_00', 
            ],
            "format": "zip",
            "area": area,
    }, 'era5_tmp.zip') 

Mathematically, it does not make sense since I have 2 variables (relative humidity at 6, relative humidity at 9) for 31 days, i.e. 2 * 31 = 62 items. 

To confirm this, I did a request on the webform (i.e. here, variable = 2m relative humidity, year = 2021, month = January, Day = 01 to 31, time = 06:00, 09:00) and it is valid. Actually, the error message "Request too large" only appears when I select 06:00, 09:00, 12:00, 15:00 for the Time parameter. This makes sense since it's 4 variables, and 4*31 = 124 > 100.
I confirmed this by downloading data for 06:00 and 09:00, which amounted to 62 netcdf files in the global zip file.

Why does my python request get cancelled for too many items when the webpage does not?
The only thing that is different between those two scenarii is obviously the platform of the API request, and the area parameter -in my request, the area is very restricted (1 tile of ERA5, [34.3, 34.4, -6.5, -6.4]), thus I do not expect this to be the cause of the error.
I also don't expect this error to come from the for loop and the succession of API requests since it is recommended by the ECMWF in the documentation.

I tried only requesting half month per half month (i.e. reducing the number of "Day" variables) but I still get the error.
When I separate the request into two (one for time 06:00 and one for time 09:00), I do not get the error but it's making the whole process a lot longer and I would like to get as much data as I can in one request. Basically, I would like to optimize my requests.
The API request as shown above should be accepted based on the file limitation (or at least, based on how I understood it), but it isn't. Does anybody know why?

Kind regards,
Léa

7 Comments

  1. Hi Léa

    We had a look at this issue last week and it looks like the use of the 'area' keyword is indeed the cause of the 'too many items' issue for this dataset.

    Generally, users should only use the keywords/values which are listed on the CDS "Download Data" page for a given dataset (although there are some exceptions).

    So the advice is to remove this 'area' keyword from your script. You can subset an area once you have downloaded the data, or use the CDS Toolbox to select an area before downloading data.

    Hope that helps,

    Kevin

  2. Hi Kevin,

    Indeed, I can do a request with more parameters when I do not put any area.
    This mean I have two choices to get data throught the ECMWF python API :

    1. download nc files parameter per parameter, for shorter period of times, but for my precise area (i.e. lighter files),
    2. download nc files without indicating any area (i.e. world-wide data), thus getting less files but way heavier

    I made some tests about the time either of these options take for download, extracting and fusing nc files. It seems that option 1 is quicker, which can be linked to the lighter files -even if there are more of them. I am here talking about retrieving data over 8 months or so.


    You mentionned using the CDS Toolbox to select an area before downloading data. This means downloading the file on the CDS website, right? By making an application? There isn't any "too many items" limit there? Using the area variable there does not interfere in the request there?

    Thanks!

    Léa

  3. Hi Léa,

    The advice we have is that you can use the API (option1), but please check carefully that the retrieved data are correct and for the requested area.

    It's possible to use the Toolbox (and i would have advised this route if the API area selection was not available), although some more programming would have been required and similar request limits would be imposed. As you only need a relatively short period, I think the CDS API is the best option for you to use,

    Hope that helps,

    Kevin

  4. Hi Kevin,

    I'll be using the area specific python request for now.

    Thanks for your support!

    Have a good day,
    Léa

  5. Hello,


    I want to know if it is possible to download 1 year data, 12 months for all days and hours , then compute the daily mean using ct.climate.daily_mean(variable) for that year  using loop over months in the cdstoolbox online interface.

    2. please how can I rename the file that will be downloaded ,so that I wont have to be sorting out which month or data is downloaded?

    The final output should look like:

                                           2m_temperature_2021_era5.nc

                                    Instead of 435447fe-bead-406c-ac9a-8b6cf8c437df.nc

    Thanks in anticipation of your response

  6. Hi Léa,

    At this moment on time, it is not possible for users to rename the output file directly in the Toolbox web interface, although you can rename the file when you download it using the 'Save as' option when you click on the link.

    Thanks,

    Kevin