The tests for a given dataset should be in a file named <system>/test_<dataset_name>.py.

Any function in that file whose name starts with "test_" will be considered to be a test. There can be other helper functions in the file which will not be interpreted as tests.

You do not have to define tests if you only want to check that the sample requests in the sample.json files, or a sample of recently completed actual requests from the brokerDB, succeed. See the -k command-line option.

Test inputs

Test functions do not need to have any inputs but can declare any or all of the following named inputs, which are useful if they need access to files in the cds-forms-<system> dataset directory...

datasetThe dataset name
forms_branchThe stack name (and cds-forms-<system> branch) that is currently being tested
forms_urlThe ssh URL of the cds-forms-<system> repo, in case the test function needs to clone it. Alternatively, you can have it cloned for you automatically by using the dsconfig input.
dsconfig

An object designed to make access to the dataset config easier. It combines the information above into one input and adds extra functionality. It has the following data members and methods:

dataset

The dataset name

forms_branch

The stack name (and cds-forms-<system> branch) that is currently being tested

forms_urlThe ssh URL of the cds-forms-<system> repo
forms_dirA working directory into which a copy of the cds-forms-<system> repo for the current stack branch will been cloned prior to the test being run. A "git pull" will be performed on this repo each time the test system is executed (unless requested otherwise). It currently defaults to $SCRATCH/cds_regression_testing/<stack>/cds-forms-<system> but this could change in the future. The path can be overridden using a command-line option.
get_form()Returns the dataset form as a list of dicts
get_widget(name)Returns the named form widget definition as a dict
get_widget_values(name)Returns the list of all values specified in the named widget. Groups are concatenated into a single list.
get_constraints()Returns the dataset constraints as a list of dicts
get_mapping()Returns the content of the mapping.json

Test output

At a minimum a test function should return a dictionary with a "request" key which contains a retrieval request dictionary for the given dataset. If no other keys are present in the output then the test will be run and expected to succeed but the only check that will be done on the resulting output file is a basic format check, if it contains a format keyword and it's set to one of a recognised list (grib, netcdf, zip, tar, tgz). (That is, unless comparison with another stack or dataset has been requested on the command line). For example, the following output would only result in a test that the request delivered a valid GRIB file...

def test_minimal():
    """Just run one request and check it succeeds but don't check the output file content"""
    return {'request': 
            {'variable': 'foobar',
             'format': 'grib',
             'year': '1979'}}

The function can also return a list of dictionaries, each specifying a different request and these will all be run as separate tests, so in fact one function can represent multiple tests. For example...

def test_every_year():
    """Run a different request for every year and check they all succeed"""
    output = []
    for year in range(1979, 2020):
        output.append({'request': 
                       {'variable': 'foobar',
                        'format': 'grib',
                        'year': year}})
    return output

Details checks on the output file content can be made by setting a key called "expected_result" to a dictionary which should have a "format" key to make a check that the file is the expected format (currently supported values are grib, netcdf, zip, tar and tgz). A "content" key can also be used to specify the content of the file. See the sections below for details on how to do this for supported formats.

Test categories

A test function can be assigned to one or more categories using the cds_regression_testing.categories decorator. Category names are specified as successive positional arguments and are of your own choosing. This allows you to execute specific groups of requests either from the command line using the relevant option (use -h to see which one) or by setting test_selection_options:categories to a list of categories in your ecFlow suite config file. 

This functionality is useful when many of a dataset's tests are there to exhaustively test the adaptor or the full data content and aren't considered necessary each time a general system update is carried out. It allows the user to tag just a few basic tests to be run under these circumstances.

Uncategorised tests are automatically assigned to the category "_uncategorised_".

For example, the following code assigns the first test (and only the first test) to two categories: "system_update" and "quick_tests". The second test will be assigned to "_uncategorised_".

from cds_regression_testing import categories

@categories('system_update', 'quick_tests')
def test_quick_and_simple():
	return {...}

def test_slow_and_detailed():
    return {...}

To run only only the "quick_tests" in your ecFlow suite, use the following in your config...

test_selection_options:
  categories: ['quick_tests']

Checking GRIB output

This section describes how to make detailed content checks on the test request output files in GRIB format. Detailed checks on the request result aren't necessary (or even performed) when using the system to compare two different stacks or one dataset with another, but are useful as an independent verification when you have some idea of what a request should return.

To check the content of a grib file set the content key to a list of dictionaries containing GRIB key/value pairs. The returned file will be required to have as many fields as items in this list and each field must pair up with an item. Order is not important. For example, this test function checks that the request returns two fields with the correct dataDate and level.

def test_grib_content():
    """Check that the grib file contains the expected number of fields and
       also check some of their keys"""

    return {'request': 
            {'product_type': 'reanalysis',
             'format': 'grib',
             'variable': 'divergence',
             'pressure_level': ['1', '50'],
             'year': '1979',
             'month': '01',
             'day': '01',
             'time': '00:00'},
            'expected_result': {'format': 'grib',
                                'content': [{'dataDate': '19790101',
                                             'level': 50},
                                            {'dataDate': '19790101',
                                             'level': 1}]}}

Generating the expected content dictionary by hand can be tedious, especially if you are making a lot of tests for a dataset, so it's often better if you write a Python function that can take a request and return the associated content dictionary, and then simply call that function from each test function. See the CAMS reanalysis tests for an example of how to do that.

GRIB tolerances

If a near, rather than exact, match is acceptable for certain keys then tolerances can be provided through a "relative_tolerances" and/or an "absolute_tolerances" key at the same level as the "content" key. This should be set to a dict in which the keys are GRIB keys and the values are the maximum allowed differences between the expected and actual values. Absolute differences are simply abs(value1-value2); relative differences are 2*abs((value1-value2)/(value1+value2)) (divide-by-zero will yield zero if both values are zero and sys.float_info.max if not). If a key is given both a relative and an absolute tolerance then a value will be considered good if it is within either one of them. For example, this checks the first and last latitude are within 1 microdegree of the specified values...

            ...
            'expected_result': {'format': 'grib',
                                'content': [{'dataDate': '19790101',
                                             'level': 50
                                             'latitudeOfFirstGridPointInDegrees': 22.2,
                                             'latitudeOfLastGridPointInDegrees': 44.4}],
                                'absolute_tolerances': {'latitudeOfFirstGridPointInDegrees': 1E-6,
                                                        'latitudeOfLastGridPointInDegrees': 1E-6}},
            ...             

Checking NetCDF output

This section describes how to make detailed content checks on the test request output files in netCDF format. Detailed checks on the request result aren't necessary (or even performed) when using the system to compare two different stacks or one dataset with another, but are useful as an independent verification when you have some idea of what a request should return.

To check the content of a NetCDF file set the content key to a dictionary that has a...

  • dimensions key set to a dict in which the keys are the dimension names and the values are the dimension sizes
  • variables key set to a dict in which the keys are the variable names and the values are dicts which have keys...
    • type, specifying the variable type as a NumPy type string, e.g. "int16" or "float32"
    • dimensions, specifying the dimensions as a list of dimension names
    • attributes (optional), specifying some or all of the variable attribute values as a dict
    • compression (optional) key specifying some or all of the compression parameters as returned by netCDF4.Variable.filters(). Useful for confirming existence/absence of compression.
    • data (optional), specifying the data array values as a list. The key name can also specify a subset of the indices to check in any of the following formats: "data[i]", "data[[i,j,k]]", "data[i:j]" or "data[i:j:k]". Negative indices have their usual Python meaning. To check multiple subsets of the array you can use multiple "data[...]" keys
  • attributes (optional) key specifying some or all of the global attribute values as a dict

It is optional to specify the global and variable attributes and variable data but other content elements are mandatory.

If you want to match a given string against a regular expression then, instead of supplying a string for the value, supply a compiled regular expression (the output of an re.compile() call) instead. The regular expression will be required to match the whole string. See below for an example.

To make no check at all on a value you can set it to cds_regression_testing.NO_CHECK. This can be useful to represent a dimension length in the case that you have no idea how long it will be. See below for an example. Alternatively, you can set a tolerance for a key - see next section.

To assist in making the content definition dictionary the content of an existing target file can be printed using the bin/print_ncdf_content script. The output from this script can be copied and pasted into the test function as the starting point for defining the dictionary. The "-a" option can be used to limit the attributes which are printed to avoid unnecessary clutter. Use "-h" for help.

Also, as mentioned above with checking GRIB, it's often quicker to implement multiple tests if you write a Python function that can take a request and return the associated content dictionary, and then simply call that function from each test function. See the CAMS reanalysis tests for an example of how to do that.

Example:

import re
from cds_regression_testing import NO_CHECK

def test_ncdf_request():

    return {
        'request': {
            'product_type': 'fictional',
            'format': 'netcdf',
            'variable': 'surface_pressure',
            'year': '1979'},
        'expected_result': {
            'format': 'netcdf',
            'content': {
                'dimensions': {
                    'longitude': 1440,
                    'latitude': 721,
                    'time': NO_CHECK},                       # Require a time dimension but don't check length
                'variables': {
                    'longitude': {
                        'type': 'float32',
                        'dimensions': ['longitude']
                        'data[0]': 0.,                       # Check first data value
                        'data[-1]': 359.75},                 # Check last data value
                    'latitude': {
                        'type': 'float32',
                        'dimensions': ['latitude']
                        'data[[0,-1]]': [-90, 90]},          # Check first and last data values
                    'time': {
                        'type': re.compile('int(16|32)'),
                        'dimensions': ['time'],
                        'attributes': {
                            'units': re.compile('hours since 1900-01-01 00:00(:00.0)?'), # Regular expression
                            'calendar': re.compile('(gregorian|standard)')}              # matches
                        'data': [692496]},
                    'sp': {
                        'type': 'int16',
                        'dimensions': ['time', 'level', 'latitude',
                                       'longitude'],
                        'attributes': {
                            'units': 'Pa'}}},
                'attributes': {
                    'Conventions': 'CF-1.6'}}}}

NetCDF tolerances

If a near, rather than exact, match is acceptable for certain numeric values then tolerances can be provided through a "relative_tolerances" and/or an "absolute_tolerances" key at the same level as the "content" key. (See the GRIB tolerances section above for a definition of relative and absolute error.) This can be set to a dict whose structure mirrors the part of the content structure for which tolerances are to be provided. Additionally, top-level keys named after NumPy data types can provide tolerances for all values of that type. A value for a precise item overrides a global value for that type. If a value is given both a relative and an absolute tolerance then a value will be considered good if it is within either one of them.

For example...

def test_ncdf_request():
    return {
        ...
        'content': {...},
        'relative_tolerances': {
            'float64': 1.E-16,         # All doubles considered equal if within 1.E-16 relative difference
            'variables': {
                'latitude': {
                    'data': 1.E-7},    # Longitudes considered equal if within 1.E-7 relative difference
                'longitude': {
                    'data': 1.E-7}}},  # Latitudes considered equal if within 1.E-7 relative difference
        'absolute_tolerances': {
            'dimensions': {
                'time': 1}}}           # Time dimension size allowed to vary by +/-1 

Checking zip/tar/tar.gz output

To check the content of an archive format (zip/tar/tar.gz) set the content key to a list of dictionaries, each following the form outlined above for either a GRIB or NetCDF content, and optionally having an additional "name" key to specify the name that it will have within the archive. Order is not important. Any items without a name key will be compared to all items within the archive. A successful test will require a one-to-one match between each item and each archive member. e.g.

def test_xyz():
	# Retrieve two NetCDF files inside a tar.gz file
    return {
        'request': {...},
        'expected_result': {
            'format': 'tgz',
            'content': [
                {'format': 'netcdf',
                 'content': {...}},
                {'format': 'netcdf',
                 'name': 'i_want_to_be_specific_about_the_name_of_this_one.nc',
                 'content': {...},
                 'relative_tolerances': {...}}]}}

Checking other formats (or making custom checks)

To check the content of any other file format you can supply a "verify_func" key in place of "expected_result". It should be set equal to a Python function of your choice which will be called after the request has succeeded. The function should have three inputs: the dataset name, the request dictionary and the result file name. The function should raise an Exception if the target is not as expected.

Checking that a request fails

It may be that you want a request to fail, perhaps to check that certain content is not available to the general user. In this case you can set the "expect_success" key to False. This will mean that it will be considered an error if the request succeeds. You can optionally also check that the expected exception is produced by setting "repr(exception)" inside "expected_result" to either a string or a compiled regular expression which fully matches the repr() form of the exception, e.g.

def test_expected_failure():
    """Check that tomorrow's forecast is not available today"""
    return {
        'request': {
            'type': 'forecast',
            'date': (datetime.now() + timedelta(1)).strftime('%Y-%m-%d')},
        'expect_success': False,
        'expected_result': {
          'repr(exception)': re.compile('.*This data is not available yet.*')}}