Introduction
Below are simple checks which can be implemented to ensure that the file content is as expected. It is assumed that all fields in all expected files should remain the same if there is not any change or issue in data production (dispatching etc).
- the number of all fields must be as expected
- the actual full field list must be the same as expected
An example how to create a reference field list from given files, GRIBs in this case, and compare it to an actual field list follows.
The same approach can be used for any type of files but an appropriate tool for field list creation must exist or be coded.
Workflow
- create a reference field list
- get full sample data and check thoroughly that it contains all expected fields
- if this is the case the field list created as per below can be stored for future needs as the valid reference
- in case of a change in the data (meaning e.g. new or removed fields after a model's upgrade) a new valid reference must be created
- get full sample data and check thoroughly that it contains all expected fields
- create an actual field list
- as a first quick check, e.g. after getting all data, one can compare that the number of all fields is equal to the number of all reference fields
- following full reference check means comparing full field list to the reference one
Examples
get_field_list.py usage
An example of creation of the reference or actual field list using python script get_field_list.py (ecCodes python api is prerequisite).
- this is a version of get_field_list.py modified for LC-WFV data sets' needs
- each data set requires to define different unique GRIB keys which must unambiguously identify any expected field
- it is rather straightforward to modify the script for other data sets
- if the script is run without -c option the actual date for each field is parsed (not usable for reference check as the data is the only expected changing GRIB key..)
#!/bin/ksh set -ex # $reflist is a link to the reference field list # $DTS_ALLOW_NEW_REFERENCE is "true" if a new reference is required/expected # get actual field list for comparison to the reference python $DTS_BIN/get_field_list.py -c lw.grib2 > list.tmp awk '{print $1}' list.tmp | sort > list # check if anything changed diff --changed-group-format='%%<' --unchanged-group-format='' list $reflist > diff.added.tmp || true diff --changed-group-format='%%>' --unchanged-group-format='' list $reflist > diff.removed.tmp || true cat diff.added.tmp | sort > diff.added cat diff.removed.tmp | sort > diff.removed if [[ -s diff.added || -s diff.removed ]] ; then # some differences found.. if [[ "${DTS_ALLOW_NEW_REFERENCE}" = "true" ]] ; then cp -f list $reflist echo "A new partial reference field list created! ($reflist)" else echo "Differences comparing to the actual reference field list found!" exit -1 fi else smslabel info "The actual reference is valid ($reflist)" fi
Reference field list example
lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_10u lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_10v lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_pp1d lw_sabm_000000001800_xxxx_fc_sl_level0000_step0_swh lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_10u lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_10v lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_pp1d lw_sabm_000000001800_xxxx_fc_sl_level0000_step10_swh lw_sabm_000000001800_xxxx_fc_sl_level0000_step11_10u ... ... lw_sabm_000000001800_xxxx_fc_sl_level0000_step96_pp1d lw_sabm_000000001800_xxxx_fc_sl_level0000_step96_swh lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_10u lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_10v lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_pp1d lw_sabm_000000001800_xxxx_fc_sl_level0000_step9_swh