When downloading a large dataset using the Python module cdsapi
, I get strange errors (see below) seemingly related to too large datafiles and the use of the old (classic) netCDF file format.
I can circumvent the error by reducing the request size (by reduing the spatial or temporal extent). Alternatively I can reduce the number of simultaneously requested variables (in the example below, requesting 1 variable works but the request fails with 2 or more variables with the erorr shown).
I find the error strange as it does not seem to be caught by the imposed request limits for number of fields.
Did I miss anything in the documentation related to the error and which limits it transpasses? Or is this a backend bug?
Error message:
Code and request causing the error message:
4 Comments
Kevin Marsh
hi Johannes,
I think this is limitation of the current grib to netcdf converter and netCDF format currently used (so unrelated to the number of items in the request).
Thanks,
Kevin
johannes hampp
Hi Kevin,
thanks for your response. Do you know if and where this limitation is somewhere documented? Especially details on which requests trigger the error?
Else we're left with either being conservative (on the save side) and using small requests or have to trial-and-error until we figure out the maximum request size.
Best,
Johannes
Kevin Marsh
Hi Johannes,
Its a netCDF limitation. "The output is currently netCDF3, a limitation of which is that all but the last variable in the file must require less than 4GiB of storage. ( This limit does not apply to the last variable in the file so can be disregarded if you have only selected one variable .)
The CDS is working on new GRIB-to-netCDF conversion software which will likely address these points in the future."
(-some text taken from the ADS download form )
Its more efficient to request 1 month of hourly ERA5 data from the CDS at a time; if you do this, you should not encounter the netcdf issue, and you can combine the netcdf files on your local system after downloading them.
Hope that helps,
Kevin
johannes hampp
Thanks for the info Kevin!
Indeed, downloading the variables for one month at a time is also our current workaround. Although our experience is that this is usually slower than downloading a full year at a time.
Best,
Johannes