The odb import tool can import data in the "wide" format (as produced by the sql tool run with "-f wide" option):

Printing file content in a delimited text format
$ ./odb sql select \* -i 2000010106.1.0.odb -f wide -o 2000010106.1.0.csv
$ head -n 1 2000010106.1.0.csv
expver@desc:string      andate@desc:integer     antime@desc:integer     seqno@hdr:integer       obstype@hdr:integer     obschar@hdr:Bitfield[codetype:9;instype:10;retrtype:6;geoarea:6]   subtype@hdr:integer     date@hdr:integer        time@hdr:integer        rdbflag@hdr:Bitfield[lat_humon:1;lat_qcsub:1;lat_override:1;lat_flag:2;lat_hqc_flag:1;lon_humon:1;lon_qcsub:1;lon_override:1;lon_flag:2;lon_hqc_flag:1;date_humon:1;date_qcsub:1;date_override:1;date_flag:2;date_hqc_flag:1;time_humon:1;time_qcsub:1;time_override:1;time_flag:2;time_hqc_flag:1;stalt_humon:1;stalt_qcsub:1;stalt_override:1;stalt_flag:2;stalt_hqc_flag:1]   status@hdr:Bitfield[active:1;passive:1;rejected:1;blacklisted:1;monthly:1;constant:1;experimental:1;whitelist:1] ...


The header of the text format is a list of column descriptions, each in a format:  <column-name>:<type>

The type can be:

  • REAL
  • DOUBLE
  • INTEGER
  • STRING
  • BITFIELD

In the last case, BITFIELD, the list of fields and their sizes in bits follows, in square brackets, for example:

Example of a bitfield definition
 rdbflag@hdr:Bitfield[lat_humon:1;lat_qcsub:1;lat_override:1;lat_flag:2;lat_hqc_flag:1;lon_humon:1;lon_qcsub:1;lon_override:1;lon_flag:2;lon_hqc_flag:1;date_humon:1;date_qcsub:1;date_override:1;date_flag:2;date_hqc_flag:1;time_humon:1;time_qcsub:1;time_override:1;time_flag:2;time_hqc_flag:1;stalt_humon:1;stalt_qcsub:1;stalt_override:1;stalt_flag:2;stalt_hqc_flag:1]

So, importing CSV text data (TAB delimited similarly as the one produced using the odb sql tool in the example above) to ODB can be done like follows:

Importing a CSV file to ODB format
$ ./odb import -d TAB 2000010106.1.0.csv 2000010106.1.0.imported.odb

Delimiter can be changed with option -d, by default it is ','.

Regarding the data in CSV, one should remember that we have currently the limitation that STRINGS can be 8 characters long only.

Converting from other binary formats like e.g. netcdf to ODB via an intermediate ASCII should be avoided, due to lose of precision (unless the data is printed with full precision).

6 Comments

  1. Hi, what is the maximum number of fields that can be imported? I have to convert a PP file with 148 fields per line (plus 3 date fields at the start), and ODB's having none of it.

    A version with only eight fields (plus 3) imported fine.

    So, given that I'll have to convert the single file into multiples with a portion of the fields in each, what is the maximum safe field count please?

    I'd rather not use trial and error!

    Cheers

    Harry

  2. I am not aware of any fixed limit on number of fields. Can you please provide the file so I can try to reproduce the problem?

  3. Thanks Peter - I'll check the file itself first. I think the text editor I was using might have corrupted it, It's good to know that there's no limit - one less thing in the equation!

    Have a good weekend.

    Cheers, Harry

  4. I tried with a fresh copy, using a better editor (gedit) to add the INTEGER and REAL tags in the header line. I get exactly the same error, which can be summarised as:

    Unknown type: 'REAL

    (that's exactly how it displays). That error is coming from Columns.cc.

    I will email you a ten-record version as the original has over 50,000 records. However this smaller version does have all the fields (3 + 148) and still gives the error.

    I'm out until Tuesday so no rush!

    Cheers

    Harry

  5. Hi Ian, thanks for the email with sample CSV file. I had a look and it turned out the problem was caused by the fact that some of the fields (columns' descriptions) in the header were separated with tab and some with space character. I converted the file so only tab was used as a separator using a following Python script (redirecting its output to a file converted.csv):

    with open('C3-EURO4M-MEDARE_PP.1st10fields.txt') as f:

        for line in f:

            print '\t'.join(line.split())

    and then successfully imported with

    $ odb import -d TAB converted.csv data.odb

    Please note that although the import tool was able to create a file with column names like: "60100-0", "60115-0", etc. we generally recommend using column names looking like valid identifiers in programming languages like C, Java or Python, i.e. consisting of only alphanumeric characters and starting with a letter. By convention, at ECMWF we only use small caps in column names.  We have not tested ODB SQL engine with identifiers like "60100-0", I suspect these kind of names can cause problems when working with SQL.

    Also, there is several columns with the same name in the provided text file. This will certainly be a problem when using SQL engine. The columns on a file should have unique names.

    1. Thank you, Peter - that's very helpful.

      I'll get back to Manola and team; they have some reconsidering to do!

      Cheers

      Harry