Introduction
The volumes involved in multi-model EPS data are very large, and it is currently impossible to perform research on long time-series without retrieving all the original fields from tape. There is a strong need to have access to such time-series at selected point locations. Typical application locations could be airports, wind farms, official monitoring (observation) stations.
There are several issues to agree and solve with other project partners before building such a dataset:
Decide on list of relevant geographical locations. This list can realistically be between 10 and 100 thousand. In order to select the locations, a short survey of potential usage will have to be established.
Decide on a point selection method: the selected locations will not coincide with the model grids. There are several solutions to address this problem, such as interpolation, nearest neighbour, or alternative downscaling method. The most relevant method may even depend on the parameter being considered (e.g. temperature or precipitation). The methodology will have to be discussed with the potential users, and a decision will have to be made and documented.
Only a subset of all the TIGGE parameters will be selected (typically surface temperature, surface winds, precipitation, surface air pressure) based on the expected usage.
A data format will have to be chosen. There do not seem to be any agreed standards for long time-series of multi-model EPS point data. There are nevertheless several formats for point time-series, in particular in NetCDF. A survey of such standards and conventions will be established and one will be chosen. The selected format will have to allow incremental modification of the dataset.
Once this is in place, two tasks must then be performed:
A series of scripts must be written that will extract the values for the selected points as the TIGGE data (global and LAM) is received in near-realtime, and the point dataset will be updated accordingly.
A “back-archive” of point values for the existing TIGGE data. The TIGGE global archive contains data since 2006, representing more than half a Petabyte of data. Re-processing this dataset in order to extract the selected data point will be very expensive, and should be carefully planned. It is clear that this should only be done once, therefore the decisions that have to be taken for this activity (list of geographical locations, data formats, parameters, extraction method, etc.) will have to be confirmed before starting the process.