Jumpiness

Customers dislike jumpy forecasts.  The role of the forecaster is to deduce the most likely forecast and assign a general probability to it taking into account all possible outcomes.  The forecaster should avoid sudden changes in their forecasts, as the subsequent NWP model forecast may or may not revert to previous outcomes.  The addition of detail, particularly at longer forecast lead-times, is inappropriate, but nevertheless the risk of severe or exceptional weather should be captured, even if the probability is low.

Perturbations, positive and negative, spread the ensemble forecasts either side of the ensemble control (CTRL) early in the forecast, and any jumps in the ensemble control are likely be shown by the ensemble also.  At very short lead-times, before perturbations have had time to amplify, the ensemble mean (EM) will be very similar to the ensemble control.   Later in the forecast non-linearity becomes more important, and so the ensemble members are less similar to the ensemble control.  Thus the ensemble mean forecast is, on average, a less jumpy and a more reliable forecast than the ensemble control.

Just because the most recent forecast is, on average, better than the previous one, it does not mean that it is always better.  A more recent forecast can be worse than a previous one, and often with increasing forecast range it becomes increasingly likely that the 12 or 24 hours older forecast may be the better one.  If the most recent NWP model output differs significantly from previous results the forecaster can use techniques outlined below in order to avoid sudden changes in forecasts being given to the customerIt can be worthwhile trying to assess the cause of the difference, but in general each ensemble solution should be viewed as one possible solution that is a member of a greater ensemble of the latest and recent solutions.

Ensemble of Data Assimilations (EDA), Ocean coupling from Day0, and future enhancements to stochastic physics and land surface perturbations are designed to improve the quality of the ensemble and should continue to reduce, though not eradicate, jumpiness.

Jumps, Trends and Flip-flops

Some terms have evolved to describe the run-to-run changes that may be seen within a series of forecast output:

  • Forecast Jump: When, for a given date, forecast results from sequential NWP model runs show a distinct and sudden change in forecast values (eg temperature forecasts valid at the same time for a given location might be 17ºC,16ºC,15ºC,22ºC ). 
  • Forecasts Flip-flopping:  When forecast results from sequential NWP model runs alternate between higher and lower values (eg temperature forecasts valid at the same time for a given location might be 17ºC,13ºC,16ºC,12ºC,18ºC). 
  • Forecast Trend:  When forecast results from sequential NWP model runs uniformly or otherwise move towards a lower or higher value (eg temperature forecasts valid at the same time for a given location might be 18ºC,16ºC,16ºC,15ºC,14ºC).


 Fig7.2-1:  A representation of forecast temperatures at a certain location for a certain date produced by a series of forecast runs.  A jump may be considered as greater than some threshold δ.  Flip-flop may be considered as a sequence of results that alternately are higher or lower than its predecessor.  A trend may be considered as a sequence of results that rise or descend, uniformly or not.


In a sequence of three forecasts, there are only two ways the final forecast in that sequence can behave relative to the others - it can represent a trend, or it can represent a flip-flop (assuming that we discount the possibility of sequential forecasts being identical).  Both occur about 50% of the time, so a forecaster should not be surprised when one of these occurs, nor should any special action be taken according to which it is.

A sudden jump in forecast values after a recent spell of fairly similar results should be viewed with caution.  Similarly, a steady or unsteady change in the forecast (e.g. a progressive speeding up of a weather system or a fall in forecast temperatures) cannot be assumed to continue.  The latest forecast value may indeed be true, since it has the benefit of later data, but could also be the result of incorrect or amplifying detail taking effect during the execution of the forecast.  

Sometimes forecasts show significant and repetitive changes in predictions for a given location.  Often this is associated with the precise positioning of a trough or ridge in the vicinity of the location of interest (e.g. if in the Northern Hemisphere the axis lies to the east, then a northerly airflow brings colder temperatures and an associated type of weather; if to the west, then a southerly airflow brings warmer temperatures and a different weather type).  


Fig7.2-2: An example of a forecast exhibiting jumpiness in the form of a major flip-flop.  On the diagram the y-axis is for forecast 2m temperatures, the x-axis shows the data time of the ensemble forecast.  The plotted values are for the forecast temperature at Paris verifying at 00Z 8 Dec 2016.  Forecast ensemble results are shown by box and whisker plots (described in Meteogram section), forecast ensemble mean values shown by black dots (red dots show values from the HRES).  Initially Day15 to Day11 forecasts were around 5°C or 6°C although with a broad range of up to ±8 to 10°C.   From 12UTC 27 Nov (Day10½) the forecast temperatures jumped to much colder values round -2°C with a relatively small spread of ±3 or4°C.  From 12UTC 30 Nov (Day7½) the forecast temperature rose suddenly back to around +6°C with a broader spread of ±8 to 10°C.  From 12UTC 3 Dec (Day4½) the forecast temperatures reverted to around +4°C with range of ±2 or 3C.   It should be remembered in general each ensemble solution should be viewed as one possible solution that is a member of a greater ensemble of the latest and recent solutions, although the later solutions do have the benefit of the most up to date data.


Fig7.2-3: An example of a forecast showing a major flip with a major difference in the forecast depth and extension of the upper trough over Eastern Europe.  The charts show 500hPa forecasts at 9km resolution VT 00UTC 25 Dec 2012 - upper chart: T+144 DT 00UTC 19 Dec 2012 with cold air across forecast Greece and SE Europe, lower chart: T+120 DT 00UTC 20 Dec 2012 with warm airmass over Greece and SE Europe.


Fig7.2-4: Standard ECMWF Mean and Spread charts for 850hPa temperature, verifying at 00UTC 25 Dec 2012, T+120 from ensemble control (CTRL) data time 00UTC 20 Dec 2012 (same case as in the lower panel of Fig7.2-3).

Left panel: The 500 hPa temperature forecast ensemble mean (isotherms) and normalized standard deviation (shading shows the normalised spread).  The normalised spread denotes how spread in the latest ensemble at a location compares with the ensemble spread at that location over recent (the last 30 days of) forecasts.  This chart, therefore, highlights uncertainty (green – relatively low, purple – relatively high).  So at Belgrade (blue circle) there is relatively high spread (uncertainty) and consequently one has less confidence in the forecast than might otherwise be expected for T+120 in that area.  Conversely the green area to the south of Ireland denotes less spread than seen in recent ensemble forecasts.

Right panel: The 500 hPa temperature forecast (at 9km resolution) and the actual ensemble Standard Deviation (at 18km resolution).  This shows that in absolute terms (as well as relative terms) there is a wide spread of ensemble solutions over eastern Europe (standard deviation between 4.5°C and 8.0°C).


Viewing Fig7.2-3 alongside Fig7.2-4, one can find evidence for why the large jump in the forecast from a single ensemble member in the vicinity of Belgrade should not have come as such a surprise.  Uncertainty in absolute and relative terms was still very high after the jump.  So in broad terms the forecaster would be justified in following the jump, but at the same time should assign a large error bar to any issued forecasts.


Jumpiness and Forecast Skill

It is important to recognise that intuitive interpretation of jumpiness – flip-flopping and trends– is not reliable although appealing.   And in particular, one should not assume that a forecast is more reliable if it has not changed substantially from the previous run.  A similar forecast should be regarded as fortuitous rather than definitive.  Rather, inspection of the ensemble forecasts will give an insight into the probability of the outcome.

The ensemble forecasts must differ from one run to the next In order for the forecast output to gradually improveThere must always be variations in evolution due to the incorporation of data in the analysis scheme and through field interactions during that evolution.  Ideally with an ensemble these “forecast jumps” should be fairly regular with no “flip-flopping”.   

Uncertainty increases with forecast range; the jumpiness increases in any individual ensemble member though, in relative terms, it decreases in the ensemble mean.  In summary:     

  • at very short range, the ensemble mean is almost identical to the ensemble control, and jumpiness is about equally low for both,
  • at short lead-times, large jumpiness does not mean there is a large error in the latest forecast,  
  • at longer lead-times larger jumpiness implies greater error, but the correlation is no more than 0·5.

Analysis of jumpiness indicates suggests :

  • The proportion of previous forecasts that are "better" than the latest ones increases with lead-time:
    • at short lead-times a small but significant proportion appear better (~15% at Day2),
    • at longer lead-times a larger a larger proportion appear better (~40% at Day6).  (Fig7.2-5).
  • There is only a very small correlation between forecast jumpiness and the quality of the latest forecast (Fig7.2-6).
  • Beyond about Day3 the ensemble mean, by using results from all ensemble members, provides more consistent forecasts than the ensemble control.  This benefit gradually increases with forecast range.  
  • The frequency of a flip (single jump) is very similar for both the ensemble mean and ensemble control.
  • The frequency of flip-flopping occurs clearly less frequently in the ensemble mean than in the ensemble control.

Persson and Strauss (1995), Zsótér et al. (2009) found:

  • the connection between forecast inconsistency (flip-flopping etc) and forecast error is weak,
  • the average error of the ensemble mean relates quite strongly to the absolute spread in the ensemble.  
  • on average, larger spread implies larger errors (this does not apply to the ensemble median or ensemble control, even if they happen to lie mid-range within the ensemble).

Fig7.2-5: The likelihood that a forecast made 12hr or 24hr previously is “better” (in terms of RMSE) than the latest forecast.   The parameter is the MSLP for Northern Europe and the period October 2009-September 2010.  The result is almost identical if ACC is used as the verification measure. The diagram suggests that at longer lead-times (about Day6 or greater) about 40% of earlier forecasts are better than the latest though at short lead-times relatively few earlier forecasts are better (nevertheless 15%, or 1in6, appear better at T+48).


Fig7.2-6:  The correlation of 24 hour forecast jumpiness and forecast error for 2m temperature against forecast lead-time for Heathrow at 12 UTC, October 2006 - March 2007.   At short lead-times the relationship between jumpiness and error is low, but increases with forecast range and asymptotically approaches 0.50 correlation.  Note that even for 0.5 the variance explained is still only 25%.

Unreliability of Trends

Trends in the development of individual synoptic systems over successive forecasts do not provide any indication of their future development.  If during its last runs the NWP has systematically changed the position and/or intensity of a synoptic feature, it does not mean that the behaviour of the next forecast can be deduced by simple extrapolation of previous forecasts.   However, the order in which the jumpiness occurs can provide additional insights.   Table7.2.1 shows occasions when rainfall has occurred and was forecast on at least one of three consecutive forecasts verifying at the same time.  According to Table7.2.1 the likelihood that precipitation occurs seems to be broadly similar when the last two forecasts are consistent (R, R, -) or the last three are “flip-flopping” (R, -, R).   The last two forecasts are, on average, more skilful than the first forecast - but they are also on average more correlated.   The earlier forecast might lack forecast skill but this is compensated by it being less correlated with the most recent forecast.  The agreement between two, on average, less correlated forecasts carries more weight than two, on average, more correlated forecasts.


Last Three Forecasts

all verifying at

the same time

Simplified Theoretical

Probability

Observed Frequency

of Rain Occurring

Number of

Forecasts

T+108T+96T+84


--

 -

0%

6%

598

R-

-

33%

15%

66

-R

-

33%

22%

46

--

R

33%

36%

59

RR

-

67%

30%

43

R-

R

67%

44%

27

-R

R

67%

47%

43

RR

R

100%

74%

157

Table 7.2-1: The percentage of cases when >2mm/24hr has been observed when up to three consecutive ECMWF runs (T+84hr, T+96hr and T+108hr) have forecast >2mm/24hr for Volkel, Netherlands October 2007-September 2010.  R indicates where such rain has been forecast and has occurred.  Similar results are found for other west and north European locations and for other NWP medium-range models.

Weakness of an Intuitive Approach towards likely outcomes

It is without doubt difficult to choose which is the more likely outcome when a series of forecasts are showing large variations, trends and/or flip-flops.  A simple exercise in recent ECMWF training courses illustrates the problem.  Students were asked to interpret the expected temperature from a series of previous sequential NWP model forecasts verifying on the same day, and to accordingly provide a single deterministic forecast for that day, based on that information.  The students used several, largely intuitive, “forecasting techniques” (see Table 7.2.2) but in the end none of them can be deemed to be particularly efficient (though any one of them could possibly have captured the correct result in a given situation).  The spread of he student's forecasts gives some idea of confidence inherent in the pattern of the forecast information provided:

  • Spread was low with the jumpy forecast case since the oscillations remained fairly steady throughout, and the next forecast could be higher or lower without changing the range of the oscillations much.
  • Spread was high with the trend forecast case illustrating the point that the next forecast may well be higher than the one before which would destroy the trend.  Some students preferred to maintain a previous trend.

All forecasts should be accepted as equally valid at their time of issue, and to take such drastic action as ignoring forecast values is wrong unless there is a very good reason to suspect the indicated evolution is incorrect (e.g. some corrupting data at initialisation, or local effects changing fine detail in forecast).  

Whilst this was not within the scope of the exercise described, it is important to understand that a key aim of forecasting is to capture the probability of a forecast event (eg a temperature, a wind strength, a wave height) rather than the precise value itself.

Technique used by the StudentsEfficiency
Use weighted average of forecasts (more weight to latest)Moderate to high
Use average of recent few forecastsModerate
Use average all available forecasts

Moderate

Use most recent forecast plus a slight trend

Poor

Use trendPoor
Accept most recent forecast

Moderate to Poor

Last forecast was not likely, use the previous forecastVery Poor

Table 7.2-2: Efficiency of intuitive techniques to assess outcomes given a series of forecast values from a series of sequential deterministic forecasts.


Fig7.2-7: The graphs show sample schematic forecasts of 12UTC temperature over four successive NWP model runs: Jumpy (top) and Trend (bottom).  The histograms show the forecasts made by the students using their own techniques.   Spread was low with the jumpy forecast case since the oscillations remained fairly steady throughout, and the next forecast could be higher or lower without changing the range of the oscillation much. The spread was high with the trend forecast case illustrating the point that the next forecast may well be higher than the one before but destroy the trend, or lower than the one before, continuing the trend.

Dealing with Jumpiness

The forecaster can try to minimise the effect of these variations by not taking the latest forecasts as necessarily being the best (although on average they are).   Techniques which may be of use are in cases of jumpiness are to:

  • adjust a forecast value (e.g. temperature, rainfall, etc) slightly lower or higher to follow the latest indications (e.g. warmer/cooler, wetter/drier, etc), but nevertheless to remain within the range of ensemble solutions from the latest and previous runs.  Reducing the change suggested by a noteworthy jump in the forecast can be the most appropriate course of action - but it does run the risk that the forecast from the next run will be even further away from the earlier solutions (i.e. the forecaster could be trying catch up with the NWP model forecasts and this illustrates one of the ways in which accuracy will be reduced).  On the other hand, it should be remembered that to follow a trend is also unreliable ~50% of the time.
  • check whether the ensemble mean and probabilities are fairly consistent with previous runs.   If not, consider creating a lagged ensemble of the last two or three ensemble forecasts to give two or three times the number of members.  This will smooth out sudden changes in evolution while preserving the breadth of possible forecast extremes and probability information from the latest run.  A grand ensemble of ECMWF forecast results may be considered to compare latest forecast results with those of other state-of-the-art NWP models. 

  • follow the ensemble mean rather than the ensemble control.  This can be more informative, especially at longer lead-times (say ≥ ~ 4 days).   However, note that strong gradients are always weakened in the ensemble mean and fine scale features (e.g. sting jets) will not be visible.
  • inspect the Cumulative Density Function (CDF) of ensemble forecasts.  This can give a useful indication on the ensemble forecast values during the jumpiness.  At longer lead-times forecast CDFs may be similar to the M-climate.  But, with time, CDF between successive runs should show less lateral variation and tend to become steeper implying higher confidence.

Fig7.2-8: An example of Cumulative Density Function (CDF) produced by a sequence of ensemble forecasts for precipitation at Zaga in Slovenia verifying for the 24hr 00UTC 27 to 00UTC 28 April 2017.  All show a very high extreme forecast index (EFI).  Note the four earlier CDFs (blues) showed a moderate slope indicating a spread of forecast precipitation intensities, and then jumped to a steeper slopes (purple and red) with lessening of spread of precipitation intensities.  Here the forecast showed a steady trend towards heavier precipitation with a jump to very heavy precipitation.  A forecaster would have been unwise at the time of the T+60 to 84hr forecast (rightmost dashed blue line) to think that this significantly wetter forecast overall was too much of  jump from the trend to be believed.

Special considerations - Jumpiness at short lead-times

In ‘finely balanced’ situations (those with dynamical sensitivity), the ensemble spread can be quite high even at quite short lead-times (about one or two days); slight differences and jumpiness among ensemble members or control can all have a large impact on the NWP model evolution (e.g. precise phasing of upper and lower levels needed for explosive cyclogenesis; high precipitation intensities can turn rain into (surprise) snow due to cooling through melting).  Severe weather situations are often associated with these sorts of uncertainties and a probabilistic approach rather than definitive forecast is generally more effective and useful.

Customer considerations

To minimise error when measured over an extended period, one should always follow the latest forecast.  The reason for not doing this is to avoid negative customer perceptions that can arise when jumpy forecasts are issued.   It is important that the forecaster understands the requirements of the customer (e.g. what their thresholds are for taking weather-related precautions), but the forecaster does not have the responsibility to make such decisions for customers - it is for the customer to decide what action to take.   Customers have to make decisions based upon the forecasts that are issued, and jumpy forecasts can cause sudden or frequent changes in customer actions - in some cases the precautions that they take are expensive or cannot be easily reversed.  So it is important to maintain the confidence of the customer and their belief that forecasters are contributing positively to a best estimate of future weather.  It is usually equally important for the customer to know about the uncertainty in a forecast as about the actual forecast value (e.g. what else could happen, or what is the worst possibility).  By making full use of the ensemble results the forecaster can give a more effective service.  Probability forecasts convey more information than simple deterministic statements.  However, weather forecasters may, paradoxically, sometimes aid their end-users more by not issuing a very uncertain forecast.

By not following all jumpiness the forecaster is involved in a trade-off whereby, over many cases and in net terms, accuracy will be reduced but customer perceptions and actions will be improved.  This is psychology not statistics.  But, most importantly, the integration of psychology can help to reduce customer displeasure and mistrust of forecasters' output and so help to increase the chance that the customer will in the end take the right action - this, after all, is the bottom line.  However, over an extended period, minimisation of jumpiness by forecasters may also improve forecaster output as verified against a NWP model.  Without full understanding it could be detrimental to the perception of forecaster performance.

Additional Sources of Information

(Note: In older material there may be references to issues that have subsequently been addressed)

Nil currently.