I need to access long (multiple years) time-series from single grid points. This is very slow due to the data being stored in spatial format. It probably quite expensive as well for the CDS platform, as each such request will cause a unnecessary heavy load. Are there any plans of having faster access to long time-series data for single points? 


17 Comments

  1. +1 for this question. I am in exactly the same situation. It seems that the download time is independent of the grid size being requested. Even when requesting a very small area, I find that a request of 5 years of data takes 1 hour to process before downloading.


    I guess changing how this works on the CDS platform is quite a big task. But does anyone have a suggestion for the quickest way of downloading a long time series for a small area? I have been downloading in blocks of 5 years at a time (which seems to be the maximum for a single request), but maybe it would be more efficient to do this in smaller blocks of 1 year, or even 1 day?

  2. I think the fundamental problem is how the data is stored. Here is a post going through the technical details https://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

    When I wanted fast access to time-series data from gridded data such as ERA5 I have downloaded all data for the region of the interest and then "re-chunked" so it is stored for in the time dimension. This procedure takes long time and requires huge amount of local storage, but gives really fast access to time-series data for any post-analyses.

    Providing fast access to time-series would probable require double storage of the data (at least for the more popular variables), but should result in much less load and cost for the CDS platform. 

  3. Thanks, that makes sense and also the comments here (How to Download Faster) suggest grouping downloads into large regions where possible as more efficient.

    It also suggests for hourly/daily data to download in requests of one month, although I haven't tested to see if this is faster than having a longer period for a request e.g. 5 years.

  4. This is essentially what I had to do also - downloading ERA5 data and re-chunking to make it faster for time-series extraction. I don't know if it's appropriate to mention it here but you can take a look at a demo app at https://warmingcities.herokuapp.com/, an app similar to the CDS toolbox but substantially faster (~10 seconds to extract 3~4 location-based variables over 40 years).

    If you haven't sorted this out yet and if you can let me know the lat/lon, years, and the parameters you need, happy to check if I have the ones you need and we can probably figure out a way to share it using CSV file somehow. I work in building energy related sector and have been working mostly with ERA5 dataset, so I have mostly surface parameters related to temperature, wind and solar/thermal radiation.

    1. Hi,

      I am conducting probabilistic yield forecasting for rooftop PV systems. But I found it extremely time-consuming to download the ensemble forecasting of ssrd for my target PV site. It takes 1.5 hours to download the 50 ensemble of hourly ssrd forecasting for the target site in one day (around 350kb) . I have to download the ensemble forecasting for several years. Could you please give me some suggestions? My request is shown below:


      server = ECMWFService("mars",    url="https://api.ecmwf.int/v1",
          key=my_api,
          email=my_email)

      server.execute(
          {
          "class": "od",
          "date": "20190101/to/20190102",
          "expver": "1",
          "levtype": "sfc",
          "number": "1/to/50",
          "param": "169.128",
          "area": "51.7/5.2/51.2/5.7",
          "step": "0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23",
          "GRID":"0.25/0.25",
          "stream": "enfo",
          "time": "0000",
          "type": "pf",
          },
          "2019_irr_step023.grib")

      1. In my experience, only 1.5 hour per API call is quite good (wink).  Is there a reason why you want to use 50 ensemble members from each 0z run? I am as well downloading ECMWF IFS archives and it takes multiple weeks to download just 1 year of data (only control from 0z and 12z, but including solar radiation at full native 9 km resolution). You can contact me directly to get access to a preliminary version. A public version will follow in 2-3 months.

        1. Hi Patrick,
          The question is that when I download the 50 ensemble ssrd forecasting with steps from 0-23h for the unlimited area from the MARS Catalogue web page as shown in the screenshot below, it takes me around 10 minutes to retrieve a 80mb grib file. But when I use above mentioned python script to download the same parameters only for a customized small area, it takes nearly 2 hours to retrieve a 400kb grib file. So I guess maybe a lot of time has been spent on the regridding part to interpolate the original data to my target site? 

          Do you know how can I retrieve the native data for a customized area without postprocessing? I need the ensemble members to feed into my solar system model to generate the probabilistic forecasting of solar generation. Thank you very much for the ideas and kind help!

          Best regards 

          Bin

          1. Hi Bin, Ayana,

            Just to clarify that this forum is intended for CDS API questions and discussions; the API you are talking about is the ECMWF web API https://confluence.ecmwf.int/x/3YtdAQ,

            and questions about this are probably best raised via the ECMWF Support Portal:

            https://confluence.ecmwf.int/site/support

            Thanks,

            Kevin

      2. Hi Bin, 

        I think I'm having the exact same issue, only difference is I need more variables.. which is even more challenging. 

        Could you share how you solved this? 

  5. Thanks Joseph Yang that app looks really useful too. So do you have a copy of the data re-chunked for time-series that your app is accessing in order to get these increased speeds? That's a nice idea.

    In the end I followed the suggestions from the post I linked to previously and downloaded the data in requests of 1 month intervals and did it for a very large area that covers all the possible points I want to extract. It needed quite a bit of storage but was reasonably quick compared to downloading the points individually or requesting large time intervals.

    Thanks for your help.

  6. The app looks interesting Joseph Yang. Is it storing hourly data globally for 40 year in some cloud service? That must be TB of data? 

    A few years ago I made this app https://rokka.shinyapps.io/shinyweatherdata/ that converts gridded (re)analysis data to time series data for use with common building energy simulation tools. Is has been on my todo list to update it with ERA5 data for a while, just have not got the time for it. And I probably restrict it to Europe as it gets too expensive storing data world-wide and I do not have time for applying for grants. 

    Anyway, if ECMWF/Copernicus want to get more apps and industrial applications they should really look at ways of also providing time series. It is a waste of resources if every application needs to provide their own dataset optimized for time series (but these forums are probably not the right place for lifting that concern).

  7. Appreciate the feedback on the app - I always thought it's silly that one can't readily look up how climate change has affected where they live. lukas lundström - that's a great app that you created. I've been also meaning to do the same based with the ERA5 data to generate EPW files and chose the ERA5 parameters to download based on the parameters needed for building energy simulation.

    Yes, I have a copy of the ERA5 dataset re-processed for faster time-series access - with about 30 surface parameters, that turned about to be about 40TB in total that I ended up downloading, although with some compression, this came down to about 10TB after quite a bit of fidgeting around. Doing this all in the cloud would have been quite expensive for me so I have a small server set up and the app just talks to my server via API that I set up - been meaning to open up the API as a service but am still at testing stage. (smile)

  8. Nice work Joseph Yang . Do you care to share technical details about format and setup? 10 TB sounds ok for 40 years and 30 parameters. Is that world-wide coverage and including oceans? You are welcome to DM on lukas.rokka {at} gmail.com as well.

    When I created my app there was no fancy tools like xarray available, so I ended up storing each grid point as a separate compressed csv files containing one year of time series data for relevant parameters. That is not a very storage efficient format, but the benefit is that you can use cheap storage like S3 glacier or S3 infrequent access. While I think accessing NetCDF files requires a more specialized server.

    Is that API access something you plan as a commercial product?

  9. Accessing NetCDF files remotely can be done by Xarray but I think OpenDAP servers such as https://www.opendap.org/software/hyrax/1.16 or https://www.unidata.ucar.edu/software/tds/current/ need to be setup. I played around with these options but ended up with a setup quite similar to https://pangeo.io/architecture.html and made it available via REST API format. The coverage is world-wide including ocean.

    I do plan on releasing API as a commercial product as what led me down this path initially was the difficulty I encountered accessing time-series historical weather data needed for building energy simulations. I can create an API key for you to play around with it and will send it to your email address with some instructions. Any feedback would be greatly appreciated!

  10. I encountered the same vexing problem of poor velocity of response while accessing a time series of a single grid. I came to terms with the Data Rods NASA Hydrology furnishes after all. It is still very slow to satisfy some UX  but much faster than other alternatives of retrospective weather data access.

    https://disc.gsfc.nasa.gov/information/tools?title=Hydrology%20Data%20Rods


    Maybe there shall be datasets available each comprising less variable in favour of velocity. There are different research groups and other individuals with interests often requiring merely the subset of all variables. 

  11. I always thought it's silly that one can't readily look up how climate change has affected where they live.

    So did I but phrased it from a little different aspect; many people relocate on this globe and they may want to perceive a calculated thermal comfort for human by harnessing retrospective weather conditions. Earlier I envisioned these pieces of information might have had the capacity to introduce in a commercial channel and hence published a desktop application in the Windows Store. Yes, it is merely a version 1.0 but having ideas and plans on my roadmap as Windows 10 has been introducing much evolution around the desktop environment. I can see a sizeable chunk of competitive work and results in this conversation and if another dialog will open about moulding and forging all these achievements into a single or multiple appealing commercial packages then maybe we could fusion and fabricate a product empowered by different parties.   

  12. Hi, I faced the same problem of accessing ERA5. I downloaded a large selection of weather variables and stored it for time-series optimised access.

    The API is available publicly for non commercial use. If someone wants to spare a couple of weeks of download-time, you try it here: https://open-meteo.com/en/docs/historical-weather-api