Selection of stations

GloFAS verification activities are based on GloFAS diagnostic points with additional quality criteria applied. For example, GloFAS v3.1 hydrological performance assessment used the following criteria to select the 1532 stations for the analysis:

  • At least 4 years of observation within the 1979-2019 reference period (regardless of gaps).
  • Limited impact of lake, reservoir and human influence on river discharge
  • Exclusion of stations with poor observation quality (typically observations showing linear interpolation between values, outliers, truncated peaks, duplicated values, etc.)

Verification period

The verification focused on either on the whole period with available river discharge observations, as in the GloFAS hydrological model performance layer in the map viewer (GloFAS hydrological model performance web product), or focused specifically on the flood season to evaluate only the period of the year when floods are likely to happen. This shorter period was applied for the general GloFAS v3.1 hydrological model performance analysis (GloFAS v3.1 hydrological performance and GloFAS v3.1 hydrological performance comparison with GloFAS v2.1).

In the main flood season focused verification, the period was determined for each catchment specifically. It was centred on the maximum observed daily mean climatology (daily climate mean observed discharge; a time series of 365 values from 1 Jan to 31 Dec; leap days were not distinguished). The daily climate mean values were computed by applying a +- 10-day window. This way, in each year a maximum of 21 OBS discharge values were used in the climate sample (for the 1979-2019 period). So, dependent on the length of the observational record available, the sample size to compute the climate mean varied from at least 84 values (for the minimum 4 years) to over 400 if most of the 41 years had data in 1979-2019.

It is worth noting that this method only determines the verification period around the most prominent flood period. If there are different flood periods, or the flood season has multiple maxima, then it only focuses on the highest of them.

The verification period was then defined by the date when the daily climate mean OBS discharge decreased to 70% of the maximum value on either side of the climate peak. This was then extended by 21 days (3 weeks) on both ends to account for some of the variability across the years. The length of the period varied from about 60 days for those catchments which have a very short and well defined flood season, to the whole year in special catchments, where floods can almost equally likely happen in each part of the year and there is no period which would have a 30% outlying peak in the daily climate mean time series.

Figure 1 shows the month of the maximum observed discharge in the daily climatology (Figure 1). The tropical areas with southern Asia tend to have peak discharge in the late part of the year, while southern Africa, northern Australia, western Europe and also southern side of Amazonia during the first months of the year. On the other hand, in the higher latitudes of the Northern Hemisphere, in Russia, Scandinavia and Canada, the highest discharge is found usually in May-June. 

The related verification period length shows quite large geographical variability in Figure 2. The shortest, most defined highest flood seasons happen often in the higher latitudes and over higher orographical areas, where the snowmelt season is over a shorter period of the year, usually in around May-June. Similarly, a more concentrated tropical rainy season can also lead to a shorter highest-flood-related verification period, like in India and south Africa and much of Australia. 


Figure 1. Month of the maximum discharge value in the observed daily climate mean.


Figure 2. Verification period length defined for the highest maximum discharge value in the observed daily climate mean.

Performance scores

GloFAS hydrological performance verification is done against river discharge observations available to the GloFAS team. The hydrological model performance analysis was conducted based on the modified Kling–Gupta efficiency metric (KGE'; ideal value is 1):

The thhre component scores of the KGE' were also used:

  • Pearson correlation (r) in KGE' highlights temporal errors through the strength of the linear relationship between simulation and observation time series. It ranges from -1 to 1, with 1 as ideal value.
  • Bias ratio represents the bias errors, ranging from -Inf to +Inf, with 0 as ideal value. Relative bias pbias (ideal value = 0) defined as β-1 (or its absolute value abspbias)
  • Variability ratio shows the variability related errors in the simulation, It ranges from -Inf to +Inf, with 0 as optimal value. Relative variability var (ideal value = 0) defined as γ-1 (or its absolute value absvar)

In the GloFAS hydrological model performance web product, the KGE' and the three KGE' components were used.

However, in the general GloFAS v3.1 model evaluation (GloFAS v3.1 hydrological performance and GloFAS v3.1 hydrological performance comparison with GloFAS v2.1), besides the correlation (pcorr), the β-1 (pbias) and γ-1 (var) were considered of the bias and variability ratios, which has 1 as optimal values instead of 0. In addition, the absolute values of pbias (abspbias) and var (absvar) were also used.

Finally, a specific index was also used for measuring timing errors (timing in days; ideal value is 0), which shows the time delay between the simulated and observed river discharge time series (and also the absolute value abstiming). Timing is time lag (or shift) L that maximises Rxy(L), cross correlation function Rxy(m) with the simulated (x) and observed (y) time series shifted by L days. Positive/negative timing error indicates delayed/advanced simulated river discharge. So, for example a timing error of +5 means the simulation needs to be shifted by 5 days backwards (brought earlier) to get to the highest correlation, i.e. the simulation is generally 5-day late predicting the ups and downs in the flow time series. Although this is not directly equivalent to measuring the timing error of the highest flood peaks, it is in very good relation with that and can be used as a simple estimate.