When downloading a large dataset using the Python module cdsapi , I get strange errors (see below) seemingly related to too large datafiles and the use of the old (classic) netCDF file format.

I can circumvent the error by reducing the request size (by reduing the spatial or temporal extent). Alternatively I can reduce the number of simultaneously requested variables (in the example below, requesting 1 variable works but the request fails with 2 or more variables with the erorr shown).

I find the error strange as it does not seem to be caught by the imposed request limits for number of fields.

Did I miss anything in the documentation related to the error and which limits it transpasses? Or is this a backend bug?


Error message:

Error message
2023-03-30 21:25:09,573 INFO Welcome to the CDS
2023-03-30 21:25:09,583 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2023-03-30 21:25:09,748 INFO Request is queued
2023-03-30 21:25:10,817 INFO Request is running
2023-03-30 21:57:33,093 INFO Request is failed
2023-03-30 21:57:33,094 ERROR Message: the request you have submitted is not valid
2023-03-30 21:57:33,095 ERROR Reason:  
grib_to_netcdf ERROR: line 4334, nc_enddef: NetCDF: One or more variable sizes violate format constraints

Cannot create netCDF classic format, dataset is too large!
Try splitting the input GRIB(s).
grib_to_netcdf: Version 2.26.2
grib_to_netcdf: Processing input file '/cache/tmp/4b5c27bf-c8fb-485d-9575-8be3d73d1fde-adaptor.mars.internal-1680204311.2481902-20481-14-tmp.grib'.
grib_to_netcdf: Found 17520 GRIB fields in 1 file.
grib_to_netcdf: Ignoring key(s): method, type, stream, refdate, hdate
grib_to_netcdf: Creating netCDF file '/cache/data8/adaptor.mars.internal-1680205252.1768951-20481-9-4b5c27bf-c8fb-485d-9575-8be3d73d1fde.nc'
grib_to_netcdf: NetCDF library version: 4.3.3.1 of Dec 10 2015 16:44:18 $
grib_to_netcdf: Creating large (64 bit) file format.
grib_to_netcdf: Defining variable 'ssr'.
grib_to_netcdf: Defining variable 'ssrd'.


Code and request causing the error message:

Request made
ds = c.retrieve(
    "reanalysis-era5-single-levels",
    {
        "product_type": "reanalysis",
        "format": "netcdf",
        "variable": [
            "surface_net_solar_radiation",
            "surface_solar_radiation_downwards",
            "toa_incident_solar_radiation",
            "total_sky_direct_solar_radiation_at_surface",
        ],
        "month": [
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
        ],
        "day": [
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15,
            16,
            17,
            18,
            19,
            20,
            21,
            22,
            23,
            24,
            25,
            26,
            27,
            28,
            29,
            30,
            31,
        ],
        "time": [
            "00:00",
            "01:00",
            "02:00",
            "03:00",
            "04:00",
            "05:00",
            "06:00",
            "07:00",
            "08:00",
            "09:00",
            "10:00",
            "11:00",
            "12:00",
            "13:00",
            "14:00",
            "15:00",
            "16:00",
            "17:00",
            "18:00",
            "19:00",
            "20:00",
            "21:00",
            "22:00",
            "23:00",
        ],
        "year": "2013",
        "area": [
            90,
            -180,
            -90,
            180,
        ],
    },
    "download.nc",
)


4 Comments

  1. hi Johannes,

    I think this is limitation of the current grib to netcdf converter and netCDF format currently used (so unrelated to the number of items in the request). 

    Thanks,

    Kevin 

  2. Hi Kevin,


    thanks for your response. Do you know if and where this limitation is somewhere documented? Especially details on which requests trigger the error?

    Else we're left with either being conservative (on the save side) and using small requests or have to trial-and-error until we figure out the maximum request size.


    Best,

    Johannes

  3. Hi Johannes,

    Its a netCDF limitation. "The output is currently netCDF3, a limitation of which is that all but the last variable in the file must require less than 4GiB of storage. ( This limit does not apply to the last variable in the file so can be disregarded if you have only selected one variable .)

    The CDS is working on new GRIB-to-netCDF conversion software which will likely address these points in the future."

    (-some text taken from the ADS download form )

    Its more efficient to request 1 month of hourly ERA5 data from the CDS at a time; if you do this, you should not encounter the netcdf issue, and you can combine the netcdf files on your local system after downloading them.

    Hope that helps,

    Kevin

  4. Thanks for the info Kevin!

    Indeed, downloading the variables for one month at a time is also our current workaround. Although our experience is that this is usually slower than downloading a full year at a time.


    Best,

    Johannes