Clustering - ENS medium-range
Weather scenarios
The emphasis is on large-scale developments when clustering ensemble members and so the 500hPa and 1000hPa geopotential forecast fields are used for daily weather scenarios. The area considered covers Europe and its immediate surroundings including the northeast Atlantic.
Fig8.1.3.1-1: Area considered for clustering purposes.
The root mean square (RMS) of all solutions of the ensemble members within this area is used as the norm.
Clustering is performed over four predefined lead-time windows:
- 3-4 days ahead.
- 5-7 days ahead.
- 8-10 days ahead.
- 11-15 days ahead.
Clustering in this way, rather than on individual forecast days, has the advantage that temporal continuity and synoptic consistency are more likely to be retained. The clustering is flow-dependent and is not based on pre-defined regimes. Since all members are regarded as equally likely, the number of members in each cluster could define its probability.
To ensure synoptic consistency, each individual ensemble member must belong to the same cluster throughout the lead-time window. For two members to be assigned to the same cluster, they must display similar synoptic 500hPa development over the whole time window. However, weak gradients in 500hPa forecasts can lead to synoptically rather different members being assigned to the same cluster because of the the RMS norm. The clustering scheme is designed to create no more than six clusters.
A cluster is represented not by the mean of its members but by its most representative member (MRM or cluster scenario). This is selected by a pattern-matching algorithm minimising root mean square differences between the “centre of gravity” of the cluster and each member. The most representative member is chosen to symbolize the cluster. But it should not be used as a deterministic forecast, nor should it be seen as a substitute for the cluster average (cluster averages are not currently available as web charts but are available for download from MARS).
The number of cluster scenarios is related to the characteristics of the ensemble forecast distribution.
- If the ensemble members fall into a few, well separated groups of ““similar forecasts”” (a multi-modal distribution”) the most representative members (cluster scenarios) will represent the range of possible weather conditions (see Fig8.1.3.1-2).
- If the spread of all the ensemble members is broad and represent a continuum which does not divide up logically into groups, then the cluster algorithm cannot partition the ensemble into significant different clusters and the ensemble median is presented as the most representative member (cluster scenario) of the single "cluster".
- If the spread of all the ensemble members is small, more than one cluster can still be created provided that a partition is possible.
Note:
- Large spread in the ensemble as a whole does not automatically lead to more clusters.
- Many clusters does not necessarily imply a large spread in the ensemble as a whole.
The clusters are intended to give an overview of the ensemble forecast information and should not be over-interpreted. The differences between two clusters will be mainly related to genuine differences in the 500hPa flow pattern, in particular if the differences are large scale.
The 1000hPa clusters correspond to the clusters the 500hPa fields; clustering is not done on the 1000hPa fields. Each cluster has the same population of members and the same most representative member. Major differences might be due to the relation between the flow at 500hPa and that at 1000hPa – fairly similar 500hPa patterns might be linked to quite different 1000hPa patterns.
Fig8.1.3.1-2: The most representative 500hPa members selected to describe the clustering of the forecast DT 00UTC 12 March 2017, T+120 to T+168 hours. Here there are 3 clusters (one per row). The most representative member or cluster scenario is the member of the cluster which has the minimum difference from the RMS of the cluster members. On 500hPa cluster scenario charts, shading denotes the 500mb height anomaly of the ensemble member height field (as contoured) from the long term climatological average
In Fig8.1.3.1-2 there are three clusters:
- Cluster1 (Top) with 22 members of which member 30 is most representative.
- Cluster2 (Middle) with 21 members of which member 25 is most representative.
- Cluster3 (Bottom) with 8 members of which member 17 is most representative.
Differences can be seen in the depth and timing of the upper trough near Scotland at T+120 and the building of the following upper ridge towards southwest Britain. However, overall the differences between the three clusters do not look to be particularly large on this occasion.
The web site includes cluster products equivalent to Fig8.1.3.1-2 for each of the four predefined lead-time windows. For additional information, the 1000hPa geopotential fields are also provided for each ensemble scenario to show the corresponding near-surface evolution. Users should note that for these the clustering has been made on the 500hPa fields, not the 1000hPa fields. Whilst the user should not treat the most representative members as deterministic solutions, it can nonetheless be helpful to examine the details of the evolution in such members, to see how a particular scenario can plausibly arise and evolve. One good way to do this is to use the cyclone database products presented by the extra-tropical cyclone diagrams, specifically the animations, at 12 hour intervals, of synoptic patterns for individual members.
Weather Regimes
After day10, it is desirable to place the daily clustering in the context of the large-scale flow and to allow the investigation of regime changes. Use of weather regimes indicates differences between most representative members (MRMs or Cluster scenarios) in terms of the large-scale flow and provides information about the possible transitions between regimes during the forecast.
So, after clustering by weather scenario, each most representative member is then attributed to one of four large scale climatological weather regimes. These have been evaluated over an area covering Europe and the north Atlantic; an area considerably larger area than that used for the weather scenario clustering (see above and Fig8.1.3.1-1).
Fig8.1.3.1-3: The Euro-Atlantic area considered for the computation of four weather regimes (NAO+, NAO-, BL, AR) derived from reanalysis of 500hPa geopotential height.
Large scale climatological weather regimes have been computed from 29 years of reanalysis of 500hPa height data (ERA-Interim and ERA-40) within the weather regime analysis area. The reanalysis results were clustered, using the same clustering algorithm as for the ensemble weather scenarios, to produce four fixed climatological regimes. These are:
- Positive North Atlantic Oscillation (regime 1 or +NAO; conventional colour Blue).
- Euro-Atlantic blocking (regime 2 or BL; conventional colour Red).
- Negative North Atlantic Oscillation (regime 3 or -NAO; conventional colour Green).
- Pronounced ridge over the Atlantic (regime 4 or AR; conventional colour Violet).
Fig8.1.3.1-4: The large scale climatological regimes for 500hPa heights computed for the cold season (October to April).
Fig8.1.3.1-5: The large scale climatological regimes for 500hPa heights computed for the warm season (May to September).
The climatological regimes have been computed from 29 years of reanalysis of 500hPa height data (ERA-Interim and ERA-40) using the same clustering algorithm as for the ensemble scenarios. Geopotential heights are shown in units of tenths of a metre, colours show anomalies from the mean 500hPa height derived over the reanalysis period.
Each most representative member is attributed to one of four large scale climatological weather regimes by a pattern-matching algorithm which assigns it to the closest large scale climatological weather regime (by minimising root mean square differences). This attribution is indicated by the colour of the frame surrounding each cluster scenario (Blue:+NAO; Green:–NAO; Red:BL; Purple:AR). The climatological weather regime refers only to the displayed most representative member.
Fig8.1.3.1-6 As Fig8.1.3.1-2 but referring to the forecast DT 00UTC 05 March 2017, T+264 to T+360hr. The colour of the frame surrounding each most representative member indicates the large scale climatological weather regimes to which it has been attributed. On 500hPa plots such as these, shading denotes anomalies relative to climatology (as in Fig8.1.3.1-2).
Flow dependent skill
It has been found:
- BL leads to the least accuracy in the forecasts - from Day3 the blocking frequencies are systematically underestimated.
- AR also leads to reduced accuracy in the forecasts – tends to be too persistent and missing transitions to BL.
- Transitions to BL are not well predicted in general, and appear particularly difficult when initially the cross-Atlantic westerly jet is in the southern location (NAO-) or the northern location (AR).
- Persistence of BL tends to be underestimated.
- Maintenance and/or transitions to an enhanced zonal flow (NAO+) tends to be overestimated.
- The ensemble spread is a useful indicator of the forecast error.
- -NAO has a higher skill than other types – The spread of the forecasts initiated in -NAO is significantly smaller than for the forecasts initiated in the other regimes.
Additional Sources of Information
(Note: In older material there may be references to issues that have subsequently been addressed)
- Read more on ENS clustering and cluster products.
- Read more on the clustering process and the weather clustering regimes.
- Read more on climatological clustering.