Why it is much slower downloading from 'reanalysis-era5-complete'?

Created by Xiaobo Yang on Jun 19, 2019

As title, when compared with downloading other ERA5 datasets from the CDS.

This is because data request for 'reanalysis-era5-complete' will retrieve data from ECMWF MARS, a tape based archiving system. While other ERA5 datasets on the CDS are hosted on the CDS itself, which is disk based. Notice the volume of ERA5 is so huge that it is possible to only copy over most popular data from ECMWF MARS to the CDS.

owned-single-by-maxy

34 Comments

Mikhail Pichugin
My data request for 'reanalysis-era5-complete' (I need temperature and humidity for 137 levels on 1 Jan 2015) is in the Queued status for 46 hours. Is this acceptable?
- Permalink
- Jun 20, 2019
Xiaobo Yang
Hi Mikhail,
Could you let me have your request ID or your CDS user ID so that I can have a detailed look at your request? You can contact us by sending an email to copernicus-support AT ecmwf DOT int.
Thank you,
Xiaobo
- Permalink
- Jun 20, 2019
Michael Shaw
Is it safe to assume the same is true for reanalysis-era5-single-levels? It takes on order days to download just a month of runoff, for example.
- Permalink
- Jul 08, 2019
Xiaobo Yang
Hi Michael,
It should not be. In fact we have received reports from several users and our technical team is looking into the problem now.
I'll keep you updated.
Thank you,
Xiaobo
- Permalink
- Jul 08, 2019
Xiaobo Yang
Just to confirm reanalysis-era5-single-levels data is stored on the CDS disks.
- Permalink
- Jul 08, 2019
Michell Fontenelle Germano
I'm having the same problems... For instance, to download 1 day is taking sometimes several hours. The same problems are happening with single levels. Is it some kind of internal problem? Is there any kind of estimation when it'll be fixed?
- Permalink
- Jul 30, 2019
Xiaobo Yang
Michell, you should see better performance downloading single level ERA5 data. Let me know if that's not the case. Thank you.
- Permalink
- Jul 30, 2019
Xiaolong Liu
Hi Xiaobo,
I request the dataset 'reanalysis-era5-complete' recently. Normally it will cost about 90 minutes when I request one-month data. While my last request has timed out seriously. This is my request ID: Request ID: e5ff686a-d3cc-4004-8e07-4f3287ec9881 and my UID is 19396. It already lasts 25 hours showed in my requests list. Can you check for me?
Thank you.
- Permalink
- Aug 12, 2019
Xiaobo Yang
Hi Xiaolong,
You can now check status of our system by going to https://cds.climate.copernicus.eu/live/queue. If you log into the CDS, you should be able to see your requests with the URL. Let me know if this helps.
Kind regards,
Xiaobo
- Permalink
- Aug 12, 2019
1. Xiaolong Liu
  Hi, it is still in process and already running more than 1day.
  Permalink
  
  Aug 12, 2019
  1. Xiaobo Yang
    Hi Xiaolong,
    This depends on how busy our system is. For example, at the moment I type, I can see 30 requests are sharing the resources the same as 'reanalysis-era5-complete', with 473 requests queuing.
    Whenever possible, you are recommended to download ERA5 data which is hosted on the CDS physically. This includes all ERA5 datasets you can see from the CDS catalogue. In general, if you do not need model level data, you should avoid requesting data from 'reanalysis-era5-complete'.
    I hope this helps.
    Xiaobo
    
    Permalink
    
    Aug 12, 2019
    1. Xiaolong Liu
      
      Hi Xiaobo,
      Yeah, I know that it only costs a few minutes when downloading the single-level data. It seems what I can do is just waiting. Sorry, I posted a wrong Request ID.
      Thanks for your response anyway.
      Xiaolong
      
      Permalink
      
      Aug 12, 2019
      1. Xiaobo Yang
        
        No problem and thank you for your patience.
        
        Permalink
        
        Aug 12, 2019
        
        X DW
        
        Hi, Xiaobo
        My requests have stayed here for several hours, And In this page the reason for queuing is Unable to establish. How can i make my request run again. Thank you.
        Climate Data Store | Live (copernicus.eu)
        And can I speed up my data retrieving by add more 'time' on my request per time. Since every time I add my 'time' to more than one, The error message will show:
        Exception: the request you have submitted is not valid. Expected 1, got 137.; Request failed; Some errors reported (last error -1).
        
        Permalink
        
        Apr 06, 2022
  2. Xiaobo Yang
    BTW, I checked requestid 'e5ff686a-d3cc-4004-8e07-4f3287ec9881' and noticed it was completed.
    
    Permalink
    
    Aug 12, 2019
Michael Shaw
Hello.
I am still seeing that pulling hourly runoff from cds/Copernicus is potentially prohibitively slow. Perhaps one year of data per day. Yet it varies, and apparently at least in part as a function of user load/demand.
And, it occasionally hangs up, seemingly somewhat at random.
Please advise?
Here is a sample script (which appears to work just fine through sometimes a few months, then I get a timeout, sometimes through years, then I get a timeout…). I follow it with a sample error I have gotten after, e.g., a timeout.

"import cdsapi
c = cdsapi.Client()

def is_leap_year(year):
return (year % 4 == 0) and (year % 100 != 0) or (year % 400 == 0)

def days_in_month(month):
if month == 1 or month == 3 or month == 5 or month == 7 or month == 8 or month == 10 or month == 12:
return 31
elif month == 2:
if is_leap_year(year):
return 29
else:
return 28
else:
return 30

for year in range(1979,2020):
   for mon in range(1,13):
for day in range(1,days_in_month(mon)+1):
if (day < 10):
   theday="0"+str(day)
else:
   theday=str(day)
if (mon < 10):
   themon="0"+str(mon)
else:
   themon=str(mon)
for time in range(0,24):
   if (time < 10):
thetime="0"+str(time)
   else:
thetime=str(time)
   print("Year, month, day, time: ",str(year),themon,theday,thetime)
   c.retrieve("reanalysis-era5-single-levels", {
"product_type": "reanalysis",
"format": "netcdf",
"variable": "runoff",
"year": str(year),
"month": themon,
"day": theday,
"time": thetime
}, "output."+str(year)+str(themon)+str(theday)+str(thetime)+".nc")"

Error:
“requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cds.climate.copernicus.eu', port=443): Read timed out. (read timeout=60)”

Sorry if I'm doing something wrong! Thanks much in advance for any advice.
Best,
Michael
- Permalink
- Dec 12, 2019
1. Michael Shaw
  This appears to now be working without any issues, now. Wondering if you made a change on your end? Whereas yesterday when I submitted multiple instances (to get multiple temporal chunks simultaneously) it seemed to slow each instance down (potentially to a timeout and crash of scripts), seems now that all instances are running along at ok rates, now.
  Thanks for whatever you may have done on your end...and for your responsiveness either way!
  Permalink
  
  Dec 12, 2019
Michael Shaw
As it turns out, I'm still seeing arbitrary snags, timeouts, and crashes of long term data pulls.
Anyone have a solution to pull the whole hourly archive of ERA5 runoff which doesn't timeout, drop files, etc?
Thanks very much in advance.
- Permalink
- Dec 13, 2019
Xiaobo Yang
Hi Michael,
I'll have a look and then get back to you - unfortunately this will take a while.
Kind regards,
Xiaobo
- Permalink
- Dec 16, 2019
Xiaobo Yang
Hi Michael,
I had a look at your script and I'd recommend you to make one request for one month of hourly ERA5 data.
... for year in range(1982, 2018): for mon in range(1, 13): # Make a request, you can set day from 1 to 31 and time from 0 to 23 as you wish. Our system is smart enough to return the proper data
You are also recommended to check the status of our system by visiting https://cds.climate.copernicus.eu/live/queue.
I hope this helps.
Kind regards,
Xiaobo
- Permalink
- Dec 17, 2019
Michael Shaw
Thank you, Xiaobo. Any hints on setting days and times from 1-31 and 0-23 within the "retrieve" itself could help save a little time. Thanks much, again.
- Permalink
- Dec 17, 2019
Michael Shaw
E.g., will something as simple as this do? Or should I include some logic within the c.retrieve method for "day", e.g., to handle different length months, and if so then how? Additionally, can you explain why my previous script was seeing apparently arbitrary timeouts once in a while and, so, subsequent crashes? Is that a function of user demand on your end resulting in variable length (temporally, wall clock) retrievals and, so, occasional (and unpredictable) slower retrievals that exceed some threshold time length and the error?

import cdsapi
c = cdsapi.Client()

for year in range(2015,2020):
   for mon in range(1,13):
if (mon < 10):
   themon="0"+str(mon)
else:
   themon=str(mon)
c.retrieve("reanalysis-era5-single-levels", {
"product_type": "reanalysis",
"format": "netcdf",
"variable": "runoff",
"year": str(year),
"month": themon,
# Assume cdsapi will skip extraneous days in shorter months without a problem...
"day": ["01","02","03","04","05","06","07","08","09","10","11",
   "12","13","14","15","16","17","18","19","20","21","22",
   "23","24","25","26","27","28","29","30","31"],
"time": ["00","01","02","03","04","05","06","07","08","09","10","11",
   "12","13","14","15","16","17","18","19","20","21","22","23"],
}, "output."+str(year)+str(themon)+".nc")
- Permalink
- Dec 17, 2019
1. Xiaobo Yang
  Hi Michael,
  You day and time settings should work as expected.
  The behaviour of when your script is executed depends on how busy our system is and the queue algorithm we adopted in our system. I'll ask our technical team to explain more if needed.
  Kind regards,
  Xiaobo
  Permalink
  
  Dec 17, 2019
Michael Shaw
Thank you Xiaobo.
Yes, more explanation of why/how retrievals fluctuate in required time as function of user demand as well as what to consider (e.g. how to design scripts to avoid) with respect to any size/time thresholds that might trigger various timeouts and crashes could be helpful.
Also, is there a way to put logic in the "day" and "time" objects/options in the retrieve method in order to, e.g. use a loop instead of a list like that?
Most importantly, though, I just want to be able to get through this pull of the data without so many timeouts and crashes that make it a bit too "user in the loop" to overcome them, so far.
Thanks very much again,
Michael
- Permalink
- Dec 17, 2019
Niels Holst
Can any one tell me, why month is running from 1 to 13 and not from 1 to 12?
- Permalink
- Jan 10, 2020
Vivien MAVEL
Months are ranging from 1 to 12.
>>> list(range(1, 13)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
- Permalink
- Jan 10, 2020
Niels Holst
Thanks! I am new to Python and was surprised by how range() works.
- Permalink
- Jan 10, 2020
1. Michael Shaw
  I found it surprising at first, too, Niels! I actually accidentally used 12 as the larger bound of my range when I first wrote the above before catching myself and remembering python "range" function uses the upper bound as the "stop" or "until" criterion, not the "through" criterion.
  Permalink
  
  Jan 10, 2020
Yan Liao
hey,
I have also met the same problem. when I tried to retrieve data from reanalysis-era5-complete, it seems to take years for waiting in the queue. I also have a problem, is the reanalysis-era5-single-level can provide the same data retrieved before from MARS. the two codes are shown below.
one is for retrieving 10m-u/v-component wind,
retrieve,
class=ea,
date=2015-01-01/to/2015-12-31,
expver=1,
levtype=sfc,(sfc: surface, pl:pressure level, pt: potential vorticity level. Ocean data in MARS is archieved with levtype = dp, wave data is achived with levtype=sfc)
param=165.128/166.128,
step=0,
stream=oper, (oper: operational atmospheric model. Wave: wave model)
time=06:00:00,
type=fc, (forecast)(cf: control forecast)(an: analysis
another is for retrieving parameter ID with 140229, and 140245-10 metre wind speed
retrieve,
class=od, (od: operational archive, ea: EAR5)
date=2015-01-01/to/2015-12-31,
expver=1, (1: operational data,default. 69: IFS cycle 41r2 test data)
param=229.140/245.140,
step=600/624/648/672,
stream=waef, (waef: wave ensemble forecast)
time=00:00:00,
type=cf,
expect=any,
target="2015long.grib"

because we wanted to retrieve the exactly the same data as the codes tell from cds, but according to cds dataset, where we can not find the same variables. is there equivalent parameters on cds for us to retrieve?
Thanks!
- Permalink
- Apr 16, 2020
1. Michela Giusti
  Hi,
  reanalysis ERA5 complete data is archived not inthe CDS disks but in the tape library at ECMWF's MARS archive. Please be aware that there is an additional queueing system for downloading data from the ECMWF's MARS archive - expect several hours to several days for submitted requests to complete at this time. You can check the Live status of your request. To retrieve MARS data efficiently (and get your data quicker!) you should retrieve all the data you need from one tape, then from the next tape, and so on. In most cases, this means retrieving all the data you need for one month, then for the next month, and so on. To find out what data is available on each tape, browse the ERA5 Catalogue and make your way until the bottom of the tree archive (where parameters are listed). Once you will have reached that level of the archive, what you see is what you can find on one single tape. See Retrieval efficiency page for more details.
  The data of your requests can be also downloaded from the CDS web form or using the CDS API. You can retrieve the CDS API script from the web form using the button 'Show API request'. Please have a look at this article for more details about size limits of the requests and some efficiency tips: Climate Data Store (CDS) documentation and How to download ERA5. Unfortunately, the 10 m wind speed is not available on the CDS.
  Here two examples of CDS API scripts for January 2015:
  
  Expand source
  
  import cdsapi c = cdsapi.Client() c.retrieve( 'reanalysis-era5-single-levels', { 'product_type': 'ensemble_members', 'variable': 'significant_height_of_combined_wind_waves_and_swell', 'year': '2015', 'month': '01', 'day': [ '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', ], 'time': '00:00', 'format': 'grib', }, 'download.grib')
  
  Expand source
  
  import cdsapi c = cdsapi.Client() c.retrieve( 'reanalysis-era5-single-levels', { 'product_type': 'reanalysis', 'variable': [ '10m_u_component_of_wind', '10m_v_component_of_wind', ], 'year': '2015', 'month': '01', 'day': [ '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', ], 'time': '00:00', 'format': 'grib', }, 'download.grib')
  
  Thanks
  Michela
  Permalink
  
  Apr 16, 2020
Brian Yalle
Hi,
I have the same problem: it takes hours to download data.
In my case, I want data from ERA5 land hourly. My goal is to download 3 variables for a 10 years range, in a specific geographic area.
Although it took a while to download the first two variables (less than 2 hours), the third one has taken more than 2 hours. There were no time in queue (zero) but the "in progress" stage is taking too long.

I hope you can help me or any advise to improve my downloads.
- Permalink
- Dec 09, 2020
1. Xiaobo Yang
  Hi Brian,
  Yes, we are aware of this problem. And my colleagues are looking into performance issues of our storage system. Unfortunately it will take some time.
  Regards,
  Xiaobo
  Permalink
  
  Dec 10, 2020
Matthew Salter
Hi, I am trying to pull down some data from reanalysis-era5-complete. I realise this will take some time given that this data is archived in the tape library at ECMWF's MARS archive. However, when I check the status of my request at https://cds.climate.copernicus.eu/live/queue there are no requests listed for me as a user. I have successfully pulled down data from reanalysis-era5-single-levels without issue. Should I not be able to see my request in the list in the previously mentioned link?
Many thanks, Matt
- Permalink
- Jun 02, 2021
1. Michela Giusti
  Hi Matt,
  Thank you for posting to the forum. As per our Forum guidelines (https://confluence.ecmwf.int/display/CUSF/How+to+post+to+the+CUS+forums#HowtoposttotheCUSforums-Etiquette), we advise users not to post personal information, so we have modified your message re: your CDS details.
  
  The questions you raise are about the CDS infrastructure, and probably more suited as a user query for expert guidance from the CDS team - can you please raise this issue via our Support Portal (Support Portal) instead?
  Best Regards,
  Michela
  Forum Administrator
  Permalink
  
  Jun 02, 2021

Feedback: C3S User Satisfaction Survey - CAMS User Satisfaction Survey

Web: C3S Help and Support - CAMS Help and Support

Page tree

34 Comments