As title, when compared with downloading other ERA5 datasets from the CDS.

This is because data request for 'reanalysis-era5-complete'  will retrieve data from ECMWF MARS, a tape based archiving system. While other ERA5 datasets on the CDS are hosted on the CDS itself, which is disk based. Notice the volume of ERA5 is so huge that it is possible to only copy over most popular data from ECMWF MARS to the CDS.

34 Comments

  1. My data request for 'reanalysis-era5-complete' (I need temperature and humidity for 137 levels on 1 Jan 2015) is in the Queued status for 46 hours. Is this acceptable?

  2. Hi Mikhail,

    Could you let me have your request ID or your CDS user ID so that I can have a detailed look at your request? You can contact us by sending an email to copernicus-support AT ecmwf DOT int.

    Thank you,

    Xiaobo

  3. Is it safe to assume the same is true for reanalysis-era5-single-levels?  It takes on order days to download just a month of runoff, for example.

  4. Hi Michael,

    It should not be. In fact we have received reports from several users and our technical team is looking into the problem now.

    I'll keep you updated.

    Thank you,

    Xiaobo

  5. Just to confirm reanalysis-era5-single-levels data is stored on the CDS disks.

  6. I'm having the same problems... For instance, to download 1 day is taking sometimes several hours. The same problems are happening with single levels. Is it some kind of internal problem? Is there any kind of estimation when it'll be fixed?

  7. Michell, you should see better performance downloading single level ERA5 data. Let me know if that's not the case. Thank you.

  8. Hi Xiaobo,

    I request the dataset 'reanalysis-era5-complete' recently. Normally it will cost about 90 minutes when I request one-month data. While my last request has timed out seriously. This is my request ID: Request ID: e5ff686a-d3cc-4004-8e07-4f3287ec9881 and my UID is 19396. It already lasts 25 hours showed in my requests list.  Can you check for me?

    Thank you. 

  9. Hi Xiaolong,

    You can now check status of our system by going to https://cds.climate.copernicus.eu/live/queue. If you log into the CDS, you should be able to see your requests with the URL. Let me know if this helps.

    Kind regards,

    Xiaobo

    1. Hi, it is still in process and already running more than 1day.

      1. Hi Xiaolong,

        This depends on how busy our system is. For example, at the moment I type, I can see 30 requests are sharing the resources the same as 'reanalysis-era5-complete', with 473 requests queuing.

        Whenever possible, you are recommended to download ERA5 data which is hosted on the CDS physically. This includes all ERA5 datasets you can see from the CDS catalogue. In general, if you do not need model level data, you should avoid requesting data from 'reanalysis-era5-complete'.

        I hope this helps.

        Xiaobo

        1. Hi Xiaobo,

          Yeah, I know that it only costs a few minutes when downloading the single-level data. It seems what I can do is just waiting. Sorry, I posted a wrong Request ID. 

          Thanks for your response anyway.

          Xiaolong

          1. No problem and thank you for your patience.

            1. Hi, Xiaobo

              My requests have stayed here for several hours, And In this page the reason for queuing is Unable to establish. How can i make my request run again. Thank you.

              Climate Data Store | Live (copernicus.eu)

              And can I speed up my data retrieving by add more 'time' on my request per time. Since every time I add my 'time' to more than one, The error message will show:

              Exception: the request you have submitted is not valid. Expected 1, got 137.; Request failed; Some errors reported (last error -1).


      2. BTW, I checked requestid 'e5ff686a-d3cc-4004-8e07-4f3287ec9881' and noticed it was completed.

  10. Hello.

    I am still seeing that pulling hourly runoff from cds/Copernicus is potentially prohibitively slow.  Perhaps one year of data per day.  Yet it varies, and apparently at least in part as a function of user load/demand.

    And, it occasionally hangs up, seemingly somewhat at random.

    Please advise?

    Here is a sample script (which appears to work just fine through sometimes a few months, then I get a timeout, sometimes through years, then I get a timeout…).  I follow it with a sample error I have gotten after, e.g., a timeout.


    "import cdsapi

    c = cdsapi.Client()


    def is_leap_year(year):

        return (year % 4 == 0) and (year % 100 != 0) or (year % 400 == 0)


    def days_in_month(month):

        if month == 1 or month == 3 or month == 5 or month == 7 or month == 8 or month == 10 or month == 12:

            return 31

        elif month == 2:

            if is_leap_year(year):

                return 29

            else:

                return 28

        else:

            return 30


    for year in range(1979,2020):

       for mon in range(1,13):

          for day in range(1,days_in_month(mon)+1):

            if (day < 10):

               theday="0"+str(day)

            else:

               theday=str(day)

            if (mon < 10):

               themon="0"+str(mon)

            else:

               themon=str(mon)

            for time in range(0,24):

               if (time < 10):

                  thetime="0"+str(time)

               else:

                  thetime=str(time)

               print("Year, month, day, time: ",str(year),themon,theday,thetime)

               c.retrieve("reanalysis-era5-single-levels", {

                  "product_type":   "reanalysis",

                  "format":         "netcdf",

                  "variable":       "runoff",

                  "year":           str(year),

                  "month":          themon,

                  "day":            theday,

                  "time":           thetime

                  }, "output."+str(year)+str(themon)+str(theday)+str(thetime)+".nc")"


    Error:

    “requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cds.climate.copernicus.eu', port=443): Read timed out. (read timeout=60)”


    Sorry if I'm doing something wrong!  Thanks much in advance for any advice.

    Best,

    Michael

    1. This appears to now be working without any issues, now.  Wondering if you made a change on your end?  Whereas yesterday when I submitted multiple instances (to get multiple temporal chunks simultaneously) it seemed to slow each instance down (potentially to a timeout and crash of scripts), seems  now that all instances are running along at ok rates, now.

      Thanks for whatever you may have done on your end...and for your responsiveness either way!

  11. As it turns out, I'm still seeing arbitrary snags, timeouts, and crashes of long term data pulls.

    Anyone have a solution to pull the whole hourly archive of ERA5 runoff which doesn't timeout, drop files, etc?

    Thanks very much in advance.

  12. Hi Michael,

    I'll have a look and then get back to you - unfortunately this will take a while.

    Kind regards,

    Xiaobo

  13. Hi Michael,

    I had a look at your script and I'd recommend you to make one request for one month of hourly ERA5 data.

    ...
    for year in range(1982, 2018):
        for mon in range(1, 13):
            # Make a request, you can set day from 1 to 31 and time from 0 to 23 as you wish. Our system is smart enough to return the proper data

    You are also recommended to check the status of our system by visiting https://cds.climate.copernicus.eu/live/queue.

    I hope this helps.

    Kind regards,

    Xiaobo

  14. Thank you, Xiaobo.  Any hints on setting days and times from 1-31 and 0-23 within the "retrieve" itself could help save a little time.  Thanks much, again.

  15. E.g., will something as simple as this do?  Or should I include some logic within the c.retrieve method for "day", e.g., to handle different length months, and if so then how?  Additionally, can you explain why my previous script was seeing apparently arbitrary timeouts once in a while and, so, subsequent crashes?  Is that a function of user demand on your end resulting in variable length (temporally, wall clock) retrievals and, so, occasional (and unpredictable) slower retrievals that exceed some threshold time length and the error?

    import cdsapi

    c = cdsapi.Client()


    for year in range(2015,2020):

       for mon in range(1,13):

            if (mon < 10):

               themon="0"+str(mon)

            else:

               themon=str(mon)

            c.retrieve("reanalysis-era5-single-levels", {

                  "product_type":   "reanalysis",

                  "format":         "netcdf",

                  "variable":       "runoff",

                  "year":           str(year),

                  "month":          themon,

    # Assume cdsapi will skip extraneous days in shorter months without a problem...

                  "day":            ["01","02","03","04","05","06","07","08","09","10","11",

                                     "12","13","14","15","16","17","18","19","20","21","22",

                                     "23","24","25","26","27","28","29","30","31"],

                  "time":           ["00","01","02","03","04","05","06","07","08","09","10","11",

                                     "12","13","14","15","16","17","18","19","20","21","22","23"],

                  }, "output."+str(year)+str(themon)+".nc")

    1. Hi Michael,

      You day and time settings should work as expected.

      The behaviour of when your script is executed depends on how busy our system is and the queue algorithm we adopted in our system. I'll ask our technical team to explain more if needed.

      Kind regards,

      Xiaobo

  16. Thank you Xiaobo.

    Yes, more explanation of why/how retrievals fluctuate in required time as function of user demand as well as what to consider (e.g. how to design scripts to avoid) with respect to any size/time thresholds that might trigger various timeouts and crashes could be helpful.

    Also, is there a way to put logic in the "day" and "time" objects/options in the retrieve method in order to, e.g. use a loop instead of a list like that?

    Most importantly, though, I just want to be able to get through this pull of the data without so many timeouts and crashes that make it a bit too "user in the loop" to overcome them, so far.

    Thanks very much again,

    Michael

  17. Can any one tell me, why month is running from 1 to 13 and not from 1 to 12?

  18. Months are ranging from 1 to 12.

    >>> list(range(1, 13))
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
  19. Thanks! I am new to Python and was surprised by how range() works.

    1. I found it surprising at first, too, Niels!  I actually accidentally used 12 as the larger bound of my range when I first wrote the above before catching myself and remembering python "range" function uses the upper bound as the "stop" or "until" criterion, not the "through" criterion.

  20. hey, 

    I have also met the same problem. when I tried to retrieve data from reanalysis-era5-complete, it seems to take years for waiting in the queue. I also have a problem, is the reanalysis-era5-single-level can provide the same data retrieved before from MARS. the two codes are shown below.

    one is for retrieving 10m-u/v-component wind, 

    retrieve,

    class=ea,

    date=2015-01-01/to/2015-12-31,

    expver=1,

    levtype=sfc,(sfc: surface, pl:pressure level, pt: potential vorticity level. Ocean data in MARS is archieved with levtype = dp, wave data is achived with levtype=sfc)

    param=165.128/166.128,

    step=0,

    stream=oper, (oper: operational atmospheric model. Wave: wave model)

    time=06:00:00,

    type=fc, (forecast)(cf: control forecast)(an: analysis

    another is for retrieving parameter ID with 140229, and 140245-10 metre wind speed

    retrieve,

    class=od, (od: operational archive, ea: EAR5)

    date=2015-01-01/to/2015-12-31,

    expver=1, (1: operational data,default. 69: IFS cycle 41r2 test data)

    param=229.140/245.140,

    step=600/624/648/672,

    stream=waef, (waef: wave ensemble forecast)

    time=00:00:00,

    type=cf,

    expect=any,

    target="2015long.grib"


    because we wanted to retrieve the exactly the same data as the codes tell from cds, but according to cds dataset, where we can not find the same variables. is there equivalent parameters on cds for us to retrieve?

    Thanks!

    1. Hi,

      reanalysis ERA5 complete data is archived not inthe CDS disks but in the tape library at ECMWF's MARS archive. Please be aware that there is an additional queueing system for downloading data from the ECMWF's MARS archive - expect several hours to several days for submitted requests to complete at this time. You can check the Live status of your request. To retrieve MARS data efficiently (and get your data quicker!) you should retrieve all the data you need from one tape, then from the next tape, and so on. In most cases, this means retrieving all the data you need for one month, then for the next month, and so on. To find out what data is available on each tape, browse the ERA5 Catalogue and make your way until the bottom of the tree archive (where parameters are listed). Once you will have reached that level of the archive, what you see is what you can find on one single tape. See Retrieval efficiency page for more details.

      The data of your requests can be also downloaded from the CDS web form or using the CDS API. You can retrieve the CDS API script from the web form using the button 'Show API request'. Please have a look at this article for more details about size limits of the requests and some efficiency tips: Climate Data Store (CDS) documentation and How to download ERA5. Unfortunately, the 10 m wind speed is not available on the CDS.

      Here two examples of CDS API scripts for January 2015:

      import cdsapi
      
      c = cdsapi.Client()
      
      c.retrieve(
          'reanalysis-era5-single-levels',
          {
              'product_type': 'ensemble_members',
              'variable': 'significant_height_of_combined_wind_waves_and_swell',
              'year': '2015',
              'month': '01',
              'day': [
                  '01', '02', '03',
                  '04', '05', '06',
                  '07', '08', '09',
                  '10', '11', '12',
                  '13', '14', '15',
                  '16', '17', '18',
                  '19', '20', '21',
                  '22', '23', '24',
                  '25', '26', '27',
                  '28', '29', '30',
                  '31',
              ],
              'time': '00:00',
              'format': 'grib',
          },
          'download.grib')
      import cdsapi
      
      c = cdsapi.Client()
      
      c.retrieve(
          'reanalysis-era5-single-levels',
          {
              'product_type': 'reanalysis',
              'variable': [
                  '10m_u_component_of_wind', '10m_v_component_of_wind',
              ],
              'year': '2015',
              'month': '01',
              'day': [
                  '01', '02', '03',
                  '04', '05', '06',
                  '07', '08', '09',
                  '10', '11', '12',
                  '13', '14', '15',
                  '16', '17', '18',
                  '19', '20', '21',
                  '22', '23', '24',
                  '25', '26', '27',
                  '28', '29', '30',
                  '31',
              ],
              'time': '00:00',
              'format': 'grib',
          },
          'download.grib')

      Thanks

      Michela

  21. Hi,

    I have the same problem: it takes hours to download data.

    In my case, I want data from ERA5 land hourly. My goal is to download 3 variables for a 10 years range, in a specific geographic area.

    Although it took a while to download the first two variables (less than 2 hours), the third one has taken more than 2 hours. There were no time in queue (zero) but the "in progress" stage is taking too long. 


    I hope you can help me or any advise to improve my downloads.

    1. Hi Brian,

      Yes, we are aware of this problem. And my colleagues are looking into performance issues of our storage system. Unfortunately it will take some time.

      Regards,

      Xiaobo

  22. Hi, I am trying to pull down some data from reanalysis-era5-complete. I realise this will take some time given that this data is archived in the tape library at ECMWF's MARS archive. However, when I check the status of my request at https://cds.climate.copernicus.eu/live/queue there are no requests listed for me as a user. I have successfully pulled down data from reanalysis-era5-single-levels without issue. Should I not be able to see my request in the list in the previously mentioned link?

    Many thanks, Matt

    1. Hi Matt,
      Thank you for posting to the forum. As per our Forum guidelines (https://confluence.ecmwf.int/display/CUSF/How+to+post+to+the+CUS+forums#HowtoposttotheCUSforums-Etiquette), we advise users not to post personal information, so we have modified your message re: your CDS details.

      The questions you raise are about the CDS infrastructure, and probably more suited as a user query for expert guidance from the CDS team - can you please raise this issue via our Support Portal (Support Portal) instead?
      Best Regards,
      Michela

      Forum Administrator