Description
For the projects TIGGE, S2S and UERRA the exact data format, WMO compliant GRIB2, is required to allow easy data processing and intercomparison. To check that the encoding is as requested, one can use tigge_check tool which is a part of ecCodes package.
The tigge_check can do also some basic quality control by checking the allowed value ranges for each parameter (with -v option) if they were defined. There is another newer better maintainable tool doing similar basic quality check called grib_check.py. Read more information about both tools in Data quality checking tools (python source code is available there).
There is a further tool grib_enc_check.py which is used for data encoding check of more recent project LC-WFV (Lead Centre for Wave Forecast Verification). The encoding checking is not as comprehensive as in tigge_check e.g. geometry checks are missing completely.
Examples of tigge_check usage
tigge_check options
tigge_check tigge_check [options] grib_file grib_file ... -l: check local area model fields -v: check value ranges -w: warnings are treated as errors -g: write good gribs -b: write bad gribs -z: return 0 to calling shell -s: check s2s fields -r: check s2s reforecast fields -u: check uerra fields
Checking UERRA data
|
Checking S2S reforecast data
|
Examples of grib_enc_check.py usage
grib_enc_check.py options
# BIN=/home/ma/emos/def/lcwfv/bin python $BIN/grib_enc_check.py usage: grib_enc_check.py [-h] [-v VERBOSITY] [-d DEFS] [inp_file [inp_file ...]] positional arguments: inp_file enter input file name(s) optional arguments: -h, --help show this help message and exit -v VERBOSITY, --verbosity VERBOSITY increase output verbosity [0-2] -d DEFS, --defs DEFS path to definition files
Checking LC-WVF data
$BIN/grib_enc_check.py lw.grib2 field 223(Mean wave direction) key: dataRepresentationTemplateNumber expected: <0..2> encoded: 40 field 224(10 metre U wind component) key: dataRepresentationTemplateNumber expected: <0..2> encoded: 40 Number of error(s) found: 2
Performance tip to speed up checking big files
There is a new tool (ecCodes v>=2.6.0) called codes_split_file which is useful for parallellising decoding/checking tasks like tigge_check.
NAME codes_split_file The output files are named input_1, input_2 etc. This is much faster than grib_copy/bufr_copy. |
If one has a very large input file with 1000s of messages, instead of running one process which sequentially checks each message in the file, one can split the file into 8 chunks and run the checking code in parallel on the 8 output files.
set -e # Assume you have 8 cores codes_split_file 8 my_big.grib # Now you will have my_big.grib_01, my_big.grib_02, ... my_big.grib_08 for f in my_big.grib_*; do # Run check in the background. Now multiple processes are running in parallel tigge_check $f & done # With the 'wait' command you can force the execution of the script to pause until a # all background jobs have finished executing before continuing the execution # of your script wait # Now clean up the split files rm -f my_big.grib_*