An ADS dataset has an associated download form, which is a collection of graphical widgets in which the user can select the data they want to retrieve - the dates, variables, steps, etc. Each of the parameters can be thought of representing a dimension. If there are N request parameters then a request is an N-dimensional hypercube.

Not all parameter combinations may correspond to available data however. The data which is actually available can be thought of as a series of hypercubes within the full hypercube volume. The ADS needs to know these hypercubes in order to prevent the user from making an invalid selection. It is the dataset provider's responsibility to provide these hypercubes in a JSON formatted text file.

They should take the form of a list of structures in which the keys are the parameter names and the values are lists of valid values for that parameter, e.g.

[
{"date": ["2003-01-01", "2003-01-02", ...],
 "species": ["o3", "co", ...],
 "level": [...],
 "step": [...],
 ...},
{"date": ["2004-01-01", "2004-01-02", ...],
 "species": ["no2", "co", "o3", ...],
 "level": [...],
 "step": [...],
 ...},
...
]

In the above example the date lists could be quite long. To keep the files readable you can instead represent a list of consecutive dates as a single value of the form "yyyy-mm-dd/yyyy-mm-dd" and represent today's date as "current".

Many structures may shared value-arrays for some keys however, so further compression can be achieved by representing them as a hierarchical data structure in which one hypercube can have child hypercubes (identified with a "_kids" key), all of which will share their parent's key-value pairs.

So in the above example, if the levels and steps were the same for both hypercubes, they could be represented as ...

[
{"level": [...],
 "step": [...],
 "_kids": [
   {"date": ["2003-01-01/2003-12-31"],
    "species": ["o3", "co", ...],
    ...},
   {"date": ["2004-01-01/current"],
    "species": ["no2", "co", "o3", ...],
    ...},
   ...],
...
]

The following example is what is currently being used for the NRT regional forecasts and analyses as a test in the development version of the ADS. Note that it's not complete - for example it doesn't reflect that pollen is not available at certain times of year or that there is no surface-only file for NH3, NO, NMVOC and PANS - but it illustrates the basic idea. Also note that the "type" (forecast/analysis) is not represented here because it is currently inferred from the step...

{"date": ["2015-10-01/current"],
 "model": ["CHIMERE", "EMEP", "ENSEMBLE", "EURAD", "LOTOSEUROS", "MATCH",
           "MOCAGE", "SILAM"],
 "_kids": [
   {"step": ["-24H-1H"],
    "levels": ["ALLLEVELS", "SURFACE"],
    "variable": ["CO", "NH3", "NMVOC", "NO", "NO2", "O3", "PANS", "PM10",
                 "PM25", "SO2"]},
   {"step": ["0H24H", "25H48H", "49H72H", "73H96H"],
    "_kids": [
      {"levels": ["SURFACE"],
       "variable": ["CO", "NH3", "NMVOC", "NO", "NO2", "O3", "PANS", "PM10",
                    "PM25", "SO2", "BIRCHPOLLEN", "GRASSPOLLEN", 
                    "OLIVEPOLLEN"]},
      {"levels": ["ALLLEVELS"],
       "variable": ["CO", "NH3", "NMVOC", "NO", "NO2", "O3", "PANS", "PM10",
                    "PM25", "SO2"]}
      ]
    }
  ]
}