To run regression tests from the command line...

  1. Clone the repo.
  2. Put the API login credentials for the stacks you want to test into files called <repo_dir>/<system>/cdsapirc_<stack_name> where <repo_dir> is the directory containing the checked out repo, <system> is cams or c3s and <stack_name> is the branch name of the stack, e.g. ~/cds-regression-testing/c3s/cdsapirc_c3stest.
  3. Execute ...
bin/regression_test <stack>To run the default tests for the given stack. The default list of tests is set as described below. If no default is set then all explicitly configured tests are run. Returned content will be checked against information supplied by the tests.
bin/regression_test -k samples <stack1>-<stack2>To use the sample.json's as tests and check they give the same result on both stacks
bin/regression_test -k broker <stack1>-<stack2>To take recently completed requests from the brokerDB of the first stack, use them as tests and check they give the same result on both stacks
bin/regression_test -h

To see all other command-line options. The above is only a sample.

When one stack is specified on the command line, the returned content will be checked against anything supplied by the test itself (see the section on defining the tests).

If a hyphen-separated list of two stacks is specified, then each test request is run on both stacks and the returned files are compared against each other. For many formats the comparison is "intelligent" in that it will ignore differences considered unimportant, such as the precise ordering of GRIB messages, the history attribute in netCDF files or the precise order of members in a tar or zip file.

If any test fails an error message will be printed and the executable will exit with a non-zero exit status. Note that, when comparing two stacks, if a request fails on both of them then this is not considered a failed test as both stacks have done the same thing.

Comparing the output from two datasets

It may be that you have a new version of an adaptor and you want to check it returns the same output as the old one on a given stack, and to do this you have created duplicates of the datasets that use the new version. To compare the originals with the duplicates you can create tests for the originals in the usual way and then, at the top level of the relevant test_<original_dataset>.py file, set compare_with to the name of the duplicate. You can then use the -D option on the command line to trigger the dataset comparison.

Only tests for datasets with a comparison dataset configured in this way will be executed and each request will be run both for the original and the duplicate. The output files will be compared in the same way as when comparing the output from two stacks (see above), allowing you to check that both datasets return the same content.

For example, assuming you want to compare dataset1 and dataset2, then in the test_dataset1.py file...

compare_with = 'dataset2'

def test_something():
    return {'request': {...}}

And to execute...

bin/regression_test -D <stack>Run tests for all datasets that have a comparison dataset configured, checking the both datasets return the same content

You can of course use other command-line arguments as well in the usual way.

Testing the MARS adaptor

A new version of the MARS adaptor code on a dev stack should be tested with this system. The minimal tests that should be run are all hand-written tests for both cams and c3s:
regression_test -m mars -k all camsdev
regression_test -m mars -k all c3sdev

It's also reassuring the try to use the sample.json files as tests in order to ensure you've run at least one request for all the c3s datasets. Since they won't have any expected output configured they can be compared against c3sprod results:
regression_test -m mars -k samples c3sdev-c3sprod

And because users can do crazy and unexpected things, it's also useful to test real-world requests taken from the broker DB:
regression_test -m mars -k broker:c3sprod c3sdev-c3sprod

Some options you might find useful are --skip-bad to skip any dev datasets that have an invalid dataset.yaml and --on-finish notify to notify you know when the tests have completed. Datasets which are expected to differ between the two stacks can be excluded with -x.