This page documents the performance of the ERIC forecast methodology obtained with EFAS v4.0.

Evaluation Time Period: 1st January 2019 - 31st December 2019

Observation Datasets:

FloodList.com

  • A total of 176 observations where the flood type was recorded as 'flash flood' were extracted from the flood event database which is populated with information from FloodList.com (see figure below)
    • This flood type was chosen in order to exclude riverine floods
  • /home/mo/mocb/Flood_Events/Flash_Floods_2019/floodlist_events_2019-01-01_to_2019-12-31_FlashFloods_Europe.csv

European Severe Weather Database (www.eswd.eu)

  • Reports of 'heavy rain' were extracted from the online database
  • They were filtered to only include reports with quality level flags of QC1 or QC2
  • Further filtering was applied by only retaining observations which mentioned flooding in the EVENT_DESC column (i.e. event description). This included following media links to view the original reports.
    • It should be noted that riverine floods are not recorded in this dataset, only flash flooding
  • A further filter was applied to only retain reports where at least 4 others were recorded in the same EFAS administration region on the same day. 
  • After this a total of 1044 observations were retained (see figure below)
  • /home/mo/mocb/Flood_Events/Flash_Floods_2019/eswd_heavyRain_floods3.csv

UK Flood Forecast Centre:

  • 500 reports of flooding in England and Wales for 2019 were provided by the Flood Forecast Centre (FFC)
  • This includes surface water and riverine flooding - efforts were made to try and remove large scale riverine events from the database but it was not possible to do this
    • Many of the rivers in the UK are small and will be <2000km2 so the scale of flash floods
    • However the evaluation could be re-run without this dataset to determine the effect
    • The FloodList.com data already has observations in the UK so perhaps removing this data won't be too detrimental
  • Reports were only given at the county level, therefore a lookup table (/home/mo/mocb/Flood_Events/FFC_Observations/FFC_countyName_regionName_lookup.csv) was created to translate these onto the EFAS Administrative Regions layer (/perm/mo/mocb/efas_shapefiles/RegionBorders.shp)
  • /home/mo/mocb/Flood_Events/FFC_Observations/FGSVExportObservations_2019_flashFloods.csv
  • Initial investigations of the evaluation found that the results improved when this dataset was omitted. This could be because the dataset included many riverine floods which were not covered by the flash flood forecasts. Therefore they have been omitted from the analysis. Future work should aim to separate the flash flood events from the riverine events.

Austrian BNMT:

  • These data were originally provided to the JRC
  • Flash flood reports were extracted from the reports which mentioned the following:
    • Überschwemmung+Hagel = Flood and Hail
    • Überschwemmung+Hagel+Mure = Flood, Hail and Mudslide
    • Niederschlag-Konvektiv = Convective precipitation
    • Niederschlag-Konvektiv-Mure = Convective precipitation and Mudslide
    • Muren = Mudslide
  • These events were reported at the county level, therefore a lookup table was created to translate them onto the EFAS Administration Regions shapefile (see above) /home/mo/mocb/Flood_Events/Austria_Database/Austria_regions_lookup.csv
  • The data were processed using this script: /home/mo/mocb/Flood_Events/Austria_Database/process_austria_data.py
  • The final output file gave 56 reports of flash flooding: /home/mo/mocb/Flood_Events/Austria_Database/austria_flashFloods_processed_2019.csv

Combining the Observations onto EFAS Administrative Regions

  • Each of the above datasets was combined together into one spreadsheet. The spreadsheet records for each day of the evaluation period the OBJECTID attributes of the EFAS Administrative Regions layer (/perm/mo/mocb/efas_shapefiles/RegionBorders.shp) where flash flooding was observed.
  • The 2655 original reports were summarised into 545 instances of flash flooding across 268 administration regions. This means that on some days there were multiple point observations of flash flooding in the same administration region, but in this evaluation this will only be treated as 1 report.
  • The processing was done in /home/mo/mocb/Flood_Events/Flash_Floods_2019/combine_floodData.py
  • The output file is: /home/mo/mocb/Flood_Events/Flash_Floods_2019/all_flashFloods_processed_2019.csv

Annual Number of Flash Flood Reports per Administration Region

The greatest number  of observations are in Poland, Austria, northern Italy and central England. Reports have also been received in eastern Spain, southern France, central Germany, southern Italy, parts of Greece and the Balkans There are significant gaps in France, Spain, Scandinavia, Romania and Bulgaria, these are likely due to a lack of reporters who contribute to the datasets used in this evaluation. The greatest number of reports are in western Austria and southern Poland.


Seasonal Number of Flash Flood Reports per Administration Region

Winter shows very few reports of flash flooding, with only a few areas in France, Spain, the UK and Crete. Spring sees flash flood observations in Poland, Austria and central Germany. Summer shows a large number of observations in Austria, as well as Poland, northern Italy, Spain, the Balkans and parts of northern England and Scotland. In autumn the UK, Italy and Spain show the greatest signal, these areas all experienced anomalously high precipitation during this season (https://climate.copernicus.eu/index.php/ESOTC/2019/european-wet-and-dry-conditions).

Annual

WinterSpring

SummerAutumn

The above graph shows that the majority of flash floods occur between June to November.

Evaluation Results

ERIC-EFAS v4.0 forecast skill was evaluated using reforecasts forced with 6 hourly precipitation forecasts from COSMO-LEPS and 6 hourly temperature and soil moisture data from the EFAS long term run following the methodology described in ERIC Flash Flood Forecast Skill.

The results show that the Hanssen-Kuipers score is very low for all return periods, exceedance probabilities and lead times. The peak scores result from low exceedance probabilities and shorter lead times. The highest score (0.09) is for 10% exceedance probability of the 2 year return period at 0-24 h lead time. Results for the 2 year return period at 0-24 h lead time show a gradual decline of the Hanssen-Kuipers score with increasing exceedance probability. Other lead times show a steeper decline in the score with increasing exceedance probability. The same pattern is also seen in the results from the 5 and 20 year return period.

Roebber plots were created for all lead times for an 20% exceedance probability of the 2 year return period and a 10% exceedance probability of the 5 year return period. The 10% exceedance probability of the 2 year return period did produce a higher score but it was felt that this threshold would be unrealistic for operational purposes, hence it was not analysed here.

In both cases, results are clustered in the bottom left for all lead times. This most likely means that the hits are overwhelmed by the number of false alarms and misses. The bias scores for the first 3 lead times is just below 0, which means that the number of false alarms and misses are almost equal, albeit the number of misses being slightly greater.

To understand the results shown above, the total number of hits, misses and false alarms was investigated. The data from the same return period and exceedance probabilities used in the Roebber plots was analysed. In both sets of results the number of hits is much less than the number of false alarms and misses. For the 2 year return period the number of misses and false alarms is approximately equal for the first two lead times. In the 5 year return period the number of false alarms is lower than the number of misses, however the decrease in the false alarms is much greater than the increase in misses. This suggests that this may be the optimal threshold to apply to the issuance of flash flood notifications as it reduces the number of false alarms.

It should be noted that the number of false alarms here is not equivalent to the number of EFAS flash flood notifications that would be issued in vain. In this evaluation if a forecasted flash flood event 48 hours before an event was a false alarm then it would also be considered as a false alarm in the forecasts 36, 24 and 12 hours before the event. In reality only one flash flood notification would have been issued to EFAS partners so they would only perceive one false alarm.

Below are plots of the hit rate and false alarm rate for the 10% exceedance probability threshold of the 5 year return period evaluated for a lead time of 0-24 hours in each administration region for the whole evaluation year and each season. In some administration regions there were no observed events in some seasons, in these cases the hit rates are shown with no shading. Darker blue shadings of the hit rate mean an improved skill (i.e. more hits as as proportion of hits plus misses), darker shades of the false alarm rate indicate a lower skill (i.e. the number of false alarms is a greater proportion of false alarms plus correct negatives).

Annually the highest hit rates are located along the Mediterranean coasts of France and Spain, some higher values are also seen in the Balkans and in parts of Greece. The annual false alarm rates are greatest in northern Italy, Sardinia the Balkans and Crete. During the winter there are not many observed flash floods therefore the hit rate in the majority of the regions is not calculated and left blank, but some darker shades are visible for four regions in southern France, Spain and Crete. The false alarm rate during the winter was high throughout regions in Italy, southern France, Spain, Austria and the Balkans. In spring low hit rates are evident in Austria, Germany and Poland, but there are often low false alarm rates in these same regions. Spring false alarms rates are highest in the Balkans and south east Spain. In summer there are low hit rates and false alarm rates throughout Europe, this could be due to the COSMO-LEPS forecasts under-estimating the extreme precipitation which drives flash floods during this season. In Autumn the highest hit rates occur in south eastern Spain and southern France as well as eastern England. Lower hit rate values occur in northern and southern Italy as well as central England. During Autumn high false alarm rates occur in northern Italy, southern Austria and Montenegro, but lower values occur throughout the rest of Europe.

10% >= 5yr RP, lead time 0-24hHit RateFalse Alarm Rate
Annual

Winter

Spring

Summer

Autumn

Comparison against ERIC EFAS v3.4

The results from this evaluation were compared against those obtained from the evaluation of ERIC EFAS v3.4 which used the previous LISFLOOD calibration at 24 hour timesteps. A fully direct comparison between these versions is not possible as the evaluation methodology has changed, whilst this evaluation was aggregated onto administration regions the previous evaluation was performed at the locations of clustered point observations. This could mean that the methodology used in EFAS v4.0 produces more hits, as the administration regions provide a bigger search area in which to intersect with observations. However the EFAS v3.4 evaluation applied a spatial buffer of 20 km to each flash flood forecast location, any observations which lay within this buffer meant the forecast was classed as a hit. However this 20 km buffer is still smaller than many administration regions, meaning that this larger search area in the EFAS v4.0 evaluation could result in more hits.  Furthermore a 1 year evaluation period was conducted in this study, whereas in ERIC EFAS v3.4 the evaluation was only performed for a 3 month period. 

In both evaluations the optimum skill score was achieved using a 10% exceedance probability of the 5 year return period. Using this threshold the hit rate skill score was computed at every lead time for both evaluations using the entire respective evaluation period. The results, shown below, show that the ERIC EFAS v4.0 gives an improvement over the hit rate computed from ERIC EFAS v3.4. The difference is strongest at shorter lead times from 0-72 hours. It should be noted that in both cases the hit rate values are low.

Conclusions

The findings from this evaluation of the ERIC-EFAS v4.0 reforecast from 1st January to 31st December 2019 suggest that it struggles to capture the majority of the observed flash flood events. The system misses the majority of the observations and simultaneously produces a large number of false alarms. This means that the solution is not to simply find a way of reducing the number of forecasts issued by ERIC. For example, it has been proposed to achieve this by applying a persistence criterion such as the reporting point being present in 2 or more forecasts before issuing a flash flood notification. This was investigated during the evaluation of EFAS v3.4, results showed that this did not improve the forecast skill as the number of hits and false alarms were reduced at the same rate. It would be expected that applying a persistence criterion to the evaluation of EFAS v4.0 would have similar results.

The issue may be the nature of what is being observed compared to what is being predicted. The observations contain point scale reports of localised flash flooding which mostly occurred between June to November 2019. Flash floods events during this period may be triggered by extreme rainfall linked to localised meteorological conditions which are not well captured in the coarse (~7 km spatial resolution) COSMO-LEPS model. For example during the summer months many events in Poland and the UK could be triggered by localised convection which is not resolved in COSMO-LEPS. Meanwhile the summer season flash flood events in Austria and northern Italy could be related to orographic enhancement of rainfall which is not captured in the coarse orography of COSMO-LEPS. The Autumn events in the UK, Italy and Spain could be related to meso- or large-scale synoptic conditions which should be better captured in COSMO-LEPS. Further work could break down the skill scores by season to assess the variation. Overall this means that trying to use a coarse scale numerical weather prediction model to capture localised flash flood events will always result in low skill scores. Instead the observations should represent events at a similar scale to the forecast model.

An alternative could be to create proxy observations from an analysis product. With the release of EFAS v4.0 comes the availability of 6 hourly rainfall observations, these could be used to create an analysis product of ERIC which identifies 1km model cells where either the 2, 5 or 20 year return period is exceeded. These would be used as the proxy observations against which the ERIC reforecast using COSMO-LEPS could be compared. The problem with this approach is that it could abstract the ERIC product away from the hazard that it is trying to forecast. For example, the evaluation of the ERIC reforecast against an ERIC analysis could result in the selection of a high exceedance probability threshold, this would greatly reduce the number of flash flood notifications that are issued. However EFAS partners will still want the ERIC flash flood notifications to capture localised flash flood events, therefore they will still experience a large number of missed events. The long term solution therefore could instead be to focus effort on improving the ERIC forecasts, rather than the evaluation procedure.

For the purposes of EFAS v4.0 it is recommended to keep the flash flood notification criterion unchanged, this means a 10% exceedance probability of the 5 year return period up to 60 hours lead time. This criterion was shown to still produce a reasonable Hanssen-Kuiper score and it helps to reduce the number of false alarms.