How do I create BUFR from a CSV - ecCodes BUFR FAQ

Here we show an example of encoding meteorological observations from an ASCII file. We will encode SYNOP data provided in CVS format utilising BUFR template 307092 which is for sub-hourly observations.

1. Choosing the template

In order to encode meteorological observations in BUFR format, first thing to do is to decide which BUFR template we should use. This can be done by investigating WMO BUFR Table D which provides information on list of common sequences. For example, let’s assume we want to BUFR encode sub-hourly data from an automatic weather station. As you can see from the below given screenshot of Table D, the suitable template for this is 3 07 092 which is BUFR template for surface observations from n-minute period.

2. Investigating the template

Next thing to do is to investigate the template to have a better understanding what parameters we can encode with the template. For this, we need to refer to WMO tables (which can be accessed here) which are

Table B Classification of elements
Table C Data description operators
Table D List of common sequences

Table D defines sequence descriptors which are alias for a sequence of other descriptors. For example, one of the elements within the sequence 3 07 092 is Pressure having table reference value 0 07 004. As you can see from the Table B screenshot below, Table B provides detailed information on this element such as unit, scale, reference value and data width.

This investigation is important to figure out what value to encode for a given key. ecCodes python interface allows us to encode values in the form below:

codes_set(ibufr, 'key', value) (or codes_set_array(ibufr, 'key', values) for array of values)

where;

ibufr: id of the message loaded in memory
key: name of the variable key or shortName from Table B
value: value to be encoded

Now let us get some help from ecCodes python interface and command line BUFR tools to investigate the chosen template in more details. Firstly, we can create a BUFR file without encoding data with the template of our choice.

content of create_bufr.py

#!/usr/bin/env python3

# (C) Copyright 1996- ECMWF.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.

from eccodes import *

ibufr = codes_bufr_new_from_samples('BUFR4')       # Creates a new valid message id from a BUFR sample
codes_set(ibufr, 'edition', 4)                     # BUFR edition number
codes_set(ibufr, 'masterTableNumber', 0)           # BUFR master table. Zero: standard WMO FM 94 BUFR tables
codes_set(ibufr, 'masterTablesVersionNumber', 31)  # Version number of master table used

ivalues = (307092)                                 # Template to be used 
codes_set(ibufr, 'unexpandedDescriptors', ivalues) # Key name to encode the sequence number is unexpandedDescriptors

fout = open('TM307092.bufr', 'wb')                 # Open output file
codes_write(ibufr, fout)                           # Write the message to output file
codes_release(ibufr)                               # Release the BUFR message from memory
fout.close()                                       # Close the file

When the above given code is run (i.e., ./create_bufr.py), we expect to have a BUFR file named TM307092.bufr created.

If the sample BUFR file cannot be found, please check the path used for samples (see the following command to be run for this, and example output):

user@host:~> codes_info

ecCodes Version 2.17.1

Default definition files path is used: /usr/local/apps/eccodes/2.17.1/GNU/7.3.0/share/eccodes/definitions
Definition files path can be changed by setting ECCODES_DEFINITION_PATH environment variable

Default SAMPLES path is used: /usr/local/apps/eccodes/2.17.1/GNU/7.3.0/share/eccodes/samples
SAMPLES path can be changed by setting ECCODES_SAMPLES_PATH environment variable

Now we can use command line BUFR tools to see the content of TM307092.bufr:

user@host:~> bufr_dump -p TM307092.bufr > TM307092.plain

The above given command dumps the content of the BUFR file in plain (key=value) format. Please refer to bufr_dump for more information and different options to use bufr_dump tool. The output will be:

delayedDescriptorReplicationFactor= {
      1, 1, 1, 1}
shortDelayedDescriptorReplicationFactor= {
      1, 1, 1, 1, 1, 1, 1, 1, 1}
edition=4
masterTableNumber=0
bufrHeaderCentre=98
bufrHeaderSubCentre=0
updateSequenceNumber=0
dataCategory=1
internationalDataSubCategory=255
dataSubCategory=110
masterTablesVersionNumber=31
localTablesVersionNumber=0
typicalYear=2012
typicalMonth=10
typicalDay=31
typicalHour=0
typicalMinute=2
typicalSecond=0
numberOfSubsets=1
observedData=1
compressedData=0
unexpandedDescriptors=307092
wigosIdentifierSeries=MISSING
wigosIssuerOfIdentifier=MISSING
wigosIssueNumber=MISSING
wigosLocalIdentifierCharacter=MISSING
blockNumber=MISSING
stationNumber=MISSING
longStationName=MISSING
year=MISSING
month=MISSING
day=MISSING
hour=MISSING
minute=MISSING
latitude=MISSING
longitude=MISSING
heightOfStationGroundAboveMeanSeaLevel=MISSING
observationSequenceNumber=MISSING
heightOfBarometerAboveMeanSeaLevel=MISSING
nonCoordinatePressure=MISSING
nonCoordinatePressure->associatedField = 262143
nonCoordinatePressure->associatedField->associatedFieldSignificance = MISSING
pressureReducedToMeanSeaLevel=MISSING
pressureReducedToMeanSeaLevel->associatedField = 262143
pressureReducedToMeanSeaLevel->associatedField->associatedFieldSignificance = MISSING
pressure=MISSING
pressure->associatedField = 262143
pressure->associatedField->associatedFieldSignificance = MISSING
nonCoordinateGeopotentialHeight=MISSING
nonCoordinateGeopotentialHeight->associatedField = 262143
nonCoordinateGeopotentialHeight->associatedField->associatedFieldSignificance = MISSING
#1#heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform=MISSING
#1#surfaceQualifierForTemperatureData=MISSING
airTemperature=MISSING
airTemperature->associatedField = 262143
airTemperature->associatedField->associatedFieldSignificance = MISSING
dewpointTemperature=MISSING
dewpointTemperature->associatedField = 262143
dewpointTemperature->associatedField->associatedFieldSignificance = MISSING
#1#relativeHumidity=MISSING
#1#relativeHumidity->associatedField = 262143
#1#relativeHumidity->associatedField->associatedFieldSignificance = MISSING
#2#relativeHumidity=MISSING
#2#relativeHumidity->associatedField = 262143
#2#relativeHumidity->associatedField->associatedFieldSignificance = MISSING
#2#heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform=MISSING
#2#surfaceQualifierForTemperatureData=MISSING
#1#depthBelowLandSurface=MISSING
soilTemperature=MISSING
soilTemperature->associatedField = 262143
soilTemperature->associatedField->associatedFieldSignificance = MISSING
soilMoisture=MISSING
soilMoisture->associatedField = 262143
soilMoisture->associatedField->associatedFieldSignificance = MISSING
#2#depthBelowLandSurface=MISSING
attributeOfFollowingValue=MISSING
horizontalVisibility=MISSING
horizontalVisibility->associatedField = 262143
horizontalVisibility->associatedField->associatedFieldSignificance = MISSING
cloudCoverTotal=MISSING
cloudCoverTotal->associatedField = 262143
cloudCoverTotal->associatedField->associatedFieldSignificance = MISSING
#1#verticalSignificanceSurfaceObservations=MISSING
cloudAmount=MISSING
cloudAmount->associatedField = 262143
cloudAmount->associatedField->associatedFieldSignificance = MISSING
heightOfBaseOfCloud=MISSING
heightOfBaseOfCloud->associatedField = 262143
heightOfBaseOfCloud->associatedField->associatedFieldSignificance = MISSING
#2#verticalSignificanceSurfaceObservations=MISSING
stateOfGround=MISSING
stateOfGround->associatedField = 262143
stateOfGround->associatedField->associatedFieldSignificance = MISSING
totalSnowDepth=MISSING
totalSnowDepth->associatedField = 262143
totalSnowDepth->associatedField->associatedFieldSignificance = MISSING
#1#timePeriod=MISSING
presentWeather=MISSING
presentWeather->associatedField = 262143
presentWeather->associatedField->associatedFieldSignificance = MISSING
#2#timePeriod=MISSING
totalPrecipitationOrTotalWaterEquivalent=MISSING
totalPrecipitationOrTotalWaterEquivalent->associatedField = 262143
totalPrecipitationOrTotalWaterEquivalent->associatedField->associatedFieldSignificance = MISSING
#3#heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform=MISSING
#1#timeSignificance=MISSING
#3#timePeriod=MISSING
windDirection=MISSING
windDirection->associatedField = 262143
windDirection->associatedField->associatedFieldSignificance = MISSING
windSpeed=MISSING
windSpeed->associatedField = 262143
windSpeed->associatedField->associatedFieldSignificance = MISSING
#2#timeSignificance=MISSING
maximumWindGustDirection=MISSING
maximumWindGustDirection->associatedField = 262143
maximumWindGustDirection->associatedField->associatedFieldSignificance = MISSING
maximumWindGustSpeed=MISSING
maximumWindGustSpeed->associatedField = 262143
maximumWindGustSpeed->associatedField->associatedFieldSignificance = MISSING
#4#heightOfSensorAboveLocalGroundOrDeckOfMarinePlatform=MISSING
#4#timePeriod=MISSING
totalSunshine=MISSING
totalSunshine->associatedField = 262143
totalSunshine->associatedField->associatedFieldSignificance = MISSING
#5#timePeriod=MISSING
#1#longWaveRadiationIntegratedOverPeriodSpecified=MISSING
#1#longWaveRadiationIntegratedOverPeriodSpecified->associatedField = 262143
#1#longWaveRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
#2#longWaveRadiationIntegratedOverPeriodSpecified=MISSING
#2#longWaveRadiationIntegratedOverPeriodSpecified->associatedField = 262143
#2#longWaveRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
shortWaveRadiationIntegratedOverPeriodSpecified=MISSING
shortWaveRadiationIntegratedOverPeriodSpecified->associatedField = 262143
shortWaveRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
globalSolarRadiationIntegratedOverPeriodSpecified=MISSING
globalSolarRadiationIntegratedOverPeriodSpecified->associatedField = 262143
globalSolarRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
diffuseSolarRadiationIntegratedOverPeriodSpecified=MISSING
diffuseSolarRadiationIntegratedOverPeriodSpecified->associatedField = 262143
diffuseSolarRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
directSolarRadiationIntegratedOverPeriodSpecified=MISSING
directSolarRadiationIntegratedOverPeriodSpecified->associatedField = 262143
directSolarRadiationIntegratedOverPeriodSpecified->associatedField->associatedFieldSignificance = MISSING
#6#timePeriod=MISSING
#1#spectrographicWavelength=MISSING
#1#spectrographicWidth=MISSING
#1#globalUvIrradiation=MISSING
#1#globalUvIrradiation->associatedField = 262143
#1#globalUvIrradiation->associatedField->associatedFieldSignificance = MISSING
#2#spectrographicWavelength=MISSING
#2#spectrographicWidth=MISSING
#2#globalUvIrradiation=MISSING
#2#globalUvIrradiation->associatedField = 262143
#2#globalUvIrradiation->associatedField->associatedFieldSignificance = MISSING

3. Preparing the CSV file

TM307092.plain tells us what parameters (keys) we can encode with the template 307092. Now we will create a file containing our data which will be in CSV format. In this step, we need to consider both the content of 307092.plain and associated WMO tables to provide key-value pairs appropriately (units, scale, etc.). Please note that although CSV file is chosen as an example here, you may use any format to provide your data as input to your python program for BUFR encoding as long as you are able to make the program read your data.

Content of datain.csv

year|month|day|hour|minute|blockNum|stationNum|stationName|lat|lon|height|pressure|presMeanSeaLev|temp|relHum|totalPrep|windDir|windSpeed
2020|09|14|14|00|17|100|NameOfStation|41.55|28.47|300|100860|102730|301.25|43|0.1|62|1.7
2020|09|14|14|00|17|101|NameOfStation1|40.55|27.47|401|100850|102700|304.25|50|0.0|70|2

The SYNOP data provided above is created just to serve as an example for this tutorial. As you may realise, it provides values for a limited number of parameters. While you are creating your input files, you should include as much information as possible from the measurements of your station and change the below given csv2bufr.py accordingly to read from the correct column and write to the correct key.

4. Encoding the BUFR data

content of csv2bufr.py

#!/usr/bin/env python3

# (C) Copyright 1996- ECMWF.
#
# This software is licensed under the terms of the Apache Licence Version 2.0
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
#
# In applying this licence, ECMWF does not waive the privileges and immunities
# granted to it by virtue of its status as an intergovernmental organisation
# nor does it submit to any jurisdiction.

from eccodes import *
import csv, argparse

# read command line to get input filename
def read_cmdline():
    p = argparse.ArgumentParser()
    p.add_argument("--i",help=" input Ascii filename")
    args = p.parse_args()
    return args

# read data from CSV file into a list
def csv_read(filename):
    data = []
    try:
        with open(filename) as csvfile:
            reader = csv.reader(csvfile, delimiter='|')
            for row in reader:
                data.append(row)
    except IOError as error:
        print(error)
        sys.exit(1)
    else:
        csvfile.close()
        return data[1:]  

# Encode the data from CSV into BUFR 
def message_encoding(FileName, fout):
    # reads the CSV file into a python list
    dataIn = csv_read(FileName)

    # loops over the rows of the csv file (one BUFR message for each row)
    for row in dataIn:
        bid = codes_bufr_new_from_samples('BUFR4')
        for ele in range(len(row)):
            row[ele] = row[ele].strip()
        try:
            bufr_encode(bid, row)
            codes_write(bid, fout)
        except CodesInternalError as ec:
            print(ec)
        codes_release(bid)

def bufr_encode(ibufr, row):
    # set header keys and values
    codes_set(ibufr, 'edition', 4)
    codes_set(ibufr, 'masterTableNumber', 0)
    codes_set(ibufr, 'bufrHeaderCentre', 98)               # 98: centre is ecmf
    codes_set(ibufr, 'bufrHeaderSubCentre', 0)
    codes_set(ibufr, 'updateSequenceNumber', 0)
    codes_set(ibufr, 'dataCategory', 0)                    # 0: Surface data - land
    codes_set(ibufr, 'internationalDataSubCategory', 7)    # 7: n-min obs from AWS stations
    codes_set(ibufr, 'dataSubCategory', 7)
    codes_set(ibufr, 'masterTablesVersionNumber', 31)
    codes_set(ibufr, 'localTablesVersionNumber', 0)
    codes_set(ibufr, 'observedData', 1)
    codes_set(ibufr, 'compressedData', 0)
    codes_set(ibufr, 'typicalYear', int(row[0]))
    codes_set(ibufr, 'typicalMonth', int(row[1]))
    codes_set(ibufr, 'typicalDay', int(row[2]))
    codes_set(ibufr, 'typicalHour', int(row[3]))
    codes_set(ibufr, 'typicalMinute', int(row[4]))
    codes_set(ibufr, 'typicalSecond', 0)
 
    ivalues=(307092)
    codes_set(ibufr, 'unexpandedDescriptors', ivalues)
 
    # set data keys and values
    codes_set(ibufr, 'year', int(row[0]))
    codes_set(ibufr, 'month', int(row[1]))
    codes_set(ibufr, 'day', int(row[2]))
    codes_set(ibufr, 'hour', int(row[3]))
    codes_set(ibufr, 'minute', int(row[4]))
    codes_set(ibufr, 'blockNumber', int(row[5]))
    codes_set(ibufr, 'stationNumber', int(row[6]))
    codes_set(ibufr, 'longStationName',row[7].strip())
    codes_set(ibufr, 'latitude', float(row[8]))
    codes_set(ibufr, 'longitude', float(row[9]))
    codes_set(ibufr, 'heightOfStationGroundAboveMeanSeaLevel', float(row[10]))
    codes_set(ibufr, 'pressure', float(row[11]))
    codes_set(ibufr, 'pressureReducedToMeanSeaLevel', float(row[12]))
    codes_set(ibufr, 'airTemperature', float(row[13]))
    codes_set(ibufr, '#1#relativeHumidity', float(row[14]))
    codes_set(ibufr, '#2#timePeriod', -10)                                           # -10: Period of precipitation observation is 10 minutes
    codes_set(ibufr, 'totalPrecipitationOrTotalWaterEquivalent', float(row[15]))
    codes_set(ibufr, '#1#timeSignificance', 2)                                       # 2: Time averaged
    codes_set(ibufr, '#3#timePeriod', -10)                                           # -10: Period of wind observations is 10 minutes
    codes_set(ibufr, 'windDirection', float(row[16]))
    codes_set(ibufr, 'windSpeed', float(row[17]))

    codes_set(ibufr, 'pack', 1)  # Required to encode the keys back in the data section

def main():
    cmdLine = read_cmdline()
    inputFilename = cmdLine.i
    print(inputFilename)
    outFilename = str(inputFilename.split('.')[0]+'.bufr')
    fout = open(outFilename, "wb")
    message_encoding(inputFilename, fout)
    fout.close()
    print(" output file {0}".format(outFilename))
     
if __name__ == '__main__':
    main()

Please pay attention to the comments within the code and make changes accordingly. When the above given code is run, we expect to have datain.bufr created as:

user@host:~> ./csv2bufr.py --i datain.csv
datain.csv
 output file datain.bufr

datain.bufr will have 2 BUFR messages since the input file has 2 rows and csv2bufr.py creates a bufr message for each row:

user@host:~> bufr_ls datain.bufr
centre       masterTablesVersionNumber  localTablesVersionNumber   typicalDate     typicalTime    numberOfSubsets
ecmf         31                         0                          20200914        140000         1
ecmf         31                         0                          20200914        140000         1
2 of 2 messages in datain.bufr

Space shortcuts

Page tree

1. Choosing the template

2. Investigating the template

3. Preparing the CSV file

4. Encoding the BUFR data

Related articles

2 Comments

Shahram Najm

Volkan Firat