This tutorial explains how to create a new experiment from an existing one by copying the initial files and setting up the experiment directory.
ECMWF HPCF
ECMWF operates a Cray based High Performance Computing Facility (HPCF). It consists of two identical Cray XC40 clusters with their own storage, but with equal access to the high performance working storage of the other cluster. This provides the benefit of having one very large system but the dual clusters add significantly to the resiliency of the system.
For more details about the ECMWF HPCF, please see: http://www.ecmwf.int/en/computing/our-facilities/supercomputer
All the OpenIFS experiments will make use of the ECMWF HPCF. During the workshop you will login to the 'front-end' machines and submit 'batch job's to the HPCF.
The scripts described below are provided to make preparing and submitting ensemble batch jobs easier. Useful commands are listed at the end of this tutorial.
Login to ECMWF Cray High Performance Computing Facility (HPCF)
Each group or participant will have a training user account on the ECMWF system, beginning with 'troifs'. This is different from the user account on the classroom computers.
First login to the ECMWF gateway computer. From here you will login to the HPCF.
ssh troifsXX@ecaccess.ecmwf.int # substitute your user id for XX troifsXX@ecaccess.ecmwf.int's password: # give your password when prompted using the securID hardware token (one time password)
When prompted select 'cca' :
troifs1@140.105.20.128's password: Select hostname (ecgate, cca, ccb) [ecgate]: cca
'cca' is the "front-end" computer to the ECMWF Cray high performance computer facility.
The contents of the account should look like:
troifs1@cca-login2:~> ls bin make python t21test scratch
If any are missing please let us know.
make: contains the OpenIFS executable for version 40r1. For the purpose of this workshop, the model has already been compiled to avoid delay and adding unnecessary load to the login nodes.
t21test: contains an example low resolution test run for OpenIFS.
Directories bin and python: contains commands and script for use during the workshop.
scratch: is the directory in which we will run OpenIFS and process the output files.
Please do not store large files in your home directory on the gateway or HPCF login. All OpenIFS experiments and large files should be stored under 'scratch'. The command quota
can be used to determine available space.
Creating the experiment initial files
This section will explain how to copy and change the experiment id.
Initial files for the observed SST experiment, 'ob00', and the climatological SST experiment 'clim', have been created and are made available in the directory : /perm/rd/openifs/oifs_workshop_2017/expts-inidata/
This directory contains multiple dates for each experiment:
ob00/2015110100 to ob00/2015110100 etc.
You do not need to run these experiments, the forecasts from a 10 member ensemble are made available on the classroom computers.
Copy previous experiment initial files
Before creating the experiment, first create the initial data. If you plan to rerun the ob00 or clim experiments and use the initial data 'as-is' you can skip this step.
In this example, we will create a new experiment id from the existing ob00 experiment, without changing any of the data itself. You can use your own previous experiments, the only difference is the location of the starting initial data.
Note that we copy all the files needed to start the model, not just the initial data files for the atmosphere and surface (ICMSH* and ICMGG* files), but also the climatology (ICMCL*), the wave model start files and the namelists.
In this example, we want to use a new experiment id, ob01, to distinguish it from ob00.
Choose a starting date from the range 1st Nov to 15th Nov (00Z)
Make a copy of the ob00 data to your /scratch directory:
cd scratch # starting from your home directory mkdir inidir # put all your personal initial data in here cd inidir mkdir ob01 # your new experiment id cd ob01 cp -rL /perm/rd/openifs/oifs_workshop_2017/expts-inidata/ob00/2015110100 . # those last characters are a 'space' and then a 'fullstop', '.' means "here".
Make sure you use the -L option! This ensures any 'linked' files are copied as actual files and not symbolic links (true for the fort.4 file). Not doing so will make the fort.4 uneditable.
Ignore any errors about files not being copied because of permission problems.
cd inidir/ob01 cp -rL /perm/rd/openifs/oifs_workshop_2017/expts-inidata/ob00/2015110[1-5]* . # this will take some time
Please only copy the initial dates you intend to use. Each date uses 0.5 Gbyte of file storage.
Change experiment id
OpenIFS forecasts are identified by an 'experiment id', a four letter string.
An experiment could consist of a single forecast date or multiple starting dates. It can be of any length and could also include a restarted forecast. However, an experiment only has one horizontal and vertical resolution.
It might help to put a README file in the experiment directory to remember what the aim of each experiment is.
If you are using multiple dates, remember to change the experiment id for each date.
Initial files
The initial files containing 2D and 3D fields to start the forecast are contained in file that begin with ICM* : ICMGG* are the initial gridpoint files, ICMSH* are the initial spectral fields. ICMCL* is the file containing the climatological forcing fields.
The experiment id is contained the names of the initial files, and it is encoded into the GRIB messages within those files.
Use the 'grib_ls' command to examine the initial files:
grib_ls ICMGGob00INIT grib_ls ICMSHob00INIT
OpenIFS namelist
The file: fort.4 is the model 'NAMELIST'. It contains a list of variable settings or 'switches' that control what the model does. These variables are grouped into separate fortran namelists.
The model will read this file when it starts up. There are many options to control the model. For more information, it is best and recommended to check the comments in the code.
The wave model, WAM, also has a separate namelist. For this workshop, none of the switches in this file needs to be changed.
Listing the experiment id
The command 'exptid' can be used to check and change the experiment id in the GRIB files.
troifs0@cca-login3:> cd inidir/ob01/2015110100 troifs0@cca-login3:> exptid ICMSHob00INIT In file ICMSHob00INIT, values of key experimentVersionNumber are: ob00
The 'experimentVersionNumber' is the GRIB parameter encoded in the GRIB fields.
This can also be seen using the 'grib_ls' command to list the contents of the file:
troifs0@cca-login3:> grib_ls -p shortName,typeOfLevel,dataDate,experimentVersionNumber ICMSHob00INIT ICMSHob00INIT shortName typeOfLevel dataDate experimentVersionNumber t hybrid 20151101 ob00 t hybrid 20151101 ob00 t hybrid 20151101 ob00 .......
The 'exptid'
and 'grib_ls'
commands can also be used for multiple files:
troifs0@cca-login3:> exptid ICM* In file ICMCLob00INIT, values of key experimentVersionNumber are: 0001 In file ICMGGob00INIT, values of key experimentVersionNumber are: 0001 ob00 In file ICMGGob00INIUA, values of key experimentVersionNumber are: ob00 In file ICMSHob00INIT, values of key experimentVersionNumber are: ob00
Note that the ICMCL and ICMGG*INIT files both have a experimentVersionNumber key of '0001'. This is used for climatological fields.
Set new experiment id
Use the exptid command to set the new experiment id to 'ob01':
troifs0@cca-login3:> exptid -n ob01 ICM*ob00* Changing expid from 'ob00' to 'ob01' for ICMCLob00INIT and writing to new file ICMCLob01INIT. Changing expid from 'ob00' to 'ob01' for ICMGGob00INIT and writing to new file ICMGGob01INIT. Changing expid from 'ob00' to 'ob01' for ICMGGob00INIUA and writing to new file ICMGGob01INIUA. Changing expid from 'ob00' to 'ob01' for ICMSHob00INIT and writing to new file ICMSHob01INIT.
Verify the command worked by checking the experimentVersionNumber in the new files:
troifs0@cca-login3:> exptid *ob01* In file ICMCLob01INIT, values of key experimentVersionNumber are: 0001 In file ICMGGob01INIT, values of key experimentVersionNumber are: 0001 ob01 In file ICMGGob01INIUA, values of key experimentVersionNumber are: ob01 In file ICMSHob01INIT, values of key experimentVersionNumber are: ob01
To save space, you can delete the ob00 files (these will always be available in the directory where they were copied from)
troifs0@cca-login3:> rm ICM*ob00* rm: remove regular file `ICMCLob00INIT'? y rm: remove regular file `ICMGGob00INIT'? y rm: remove regular file `ICMGGob00INIUA'? y rm: remove regular file `ICMSHob00INIT'? y
Further modification to initial files
Now the initial files with the correct experiment ID have been prepared, further modifications to the input files can be made.
To modify the SST for example, see the tutorial on 'Modifying the SST'.
Creating the forecast experiment
Once the initial data directory is prepared, the next step is to create the experiment directory structure where the model will run. This is not the same location as the initial files just created.
To create the experiment directory structure, use the 'createEX' command. To see what arguments it takes, use the command:
createEX -h
The most basic form of the command is:
createEX --date 2015110100 -e ob01
This would create the ob01 forecast directories for a single date. The default is to create the experiment directory in your 'scratch' directory with the same name as the experiment id. e.g $HOME/scratch/ob01.
New experiment: ob01 - single data, 10 members
Following the creation of the initial files for the experiment id 'ob01' given above, let's create a forecast experiment with 5 ensemble members for a single forecast date:
troifs0@cca-login3:~> createEX -d 2015110100 -e ob01 -i scratch/inidir -m 10
We tell this command it can find our new initial files in scratch/inidir. By default, this command will create the experiment directory also in scratch. Be careful not to confuse the initial data directory with the experiment directory. These should be kept separate.
Note, one ensemble member '00' is always created, so the -m argument lists the total number of members and defaults to 1. Directories are numbered starting from zero.
Only use 10 members in total otherwise you will not be able to compare with the climatological SST experiment which uses 10 members.
This generates the output:
Creating directory structure for experiment ob01 in directory /scratch/ectrain/troifs0/ob01/... Date : 2015110100 Copying files from directory (inidir): scratch/inidir Date: 2015110100 ......... Created forecast experiment directory : /scratch/ectrain/troifs0/ob01/2015110100/05/ Linking files and copying namelist. Using IFS data directory: /fwsm/lb/project/openifs/ifsdata/ Created forecast experiment directory : /scratch/ectrain/troifs0/ob01/2015110100/06/ Linking files and copying namelist. Using IFS data directory: /fwsm/lb/project/openifs/ifsdata/ Created forecast experiment directory : /scratch/ectrain/troifs0/ob01/2015110100/07/ Linking files and copying namelist. Using IFS data directory: /fwsm/lb/project/openifs/ifsdata/ Created forecast experiment directory : /scratch/ectrain/troifs0/ob01/2015110100/08/ Linking files and copying namelist. Using IFS data directory: /fwsm/lb/project/openifs/ifsdata/ Created forecast experiment directory : /scratch/ectrain/troifs0/ob01/2015110100/09/ Linking files and copying namelist. Using IFS data directory: /fwsm/lb/project/openifs/ifsdata/ All done: /scratch/ectrain/troifs0/ob01/ ready.
Directory structure
If we look at in the directory: /scratch/ectrain/troifs0/ob01/2015110100, we see:
troifs0@ccb-login3:> cd scratch/ob01/2015110100 troifs0@ccb-login3:> ls 00 03 06 09 ICMCLob50INIT ICMSHob50INIT specwavein wam_subgrid_0 01 04 07 ICMGGob50INIT cdwavein uwavein wam_subgrid_1 02 05 08 ICMGGob50INIUA sfcwindin wam_grid_tables wam_subgrid_2
Each member will be run in the numbered directories: "00", "01", "02", "03", and so on. These contain:
troifs0@ccb-login3:> ls -l 00 total 16 lrwxrwxrwx 1 troifs0 ectrain 52 May 25 18:15 255l_2 -> /fwsm/lb/project/openifs/ifsdata/40r1/climate/255l_2 lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 ICMCLob50INIT -> ../ICMCLob50INIT lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 ICMGGob50INIT -> ../ICMGGob50INIT lrwxrwxrwx 1 troifs0 ectrain 17 May 25 18:15 ICMGGob50INIUA -> ../ICMGGob50INIUA lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 ICMSHob50INIT -> ../ICMSHob50INIT lrwxrwxrwx 1 troifs0 ectrain 11 May 25 18:15 cdwavein -> ../cdwavein -rw-r----- 1 troifs0 ectrain 9886 May 25 18:15 fort.4 lrwxrwxrwx 1 troifs0 ectrain 49 May 25 18:15 ifsdata -> /fwsm/lb/project/openifs/ifsdata/40r1/climatology lrwxrwxrwx 1 troifs0 ectrain 40 May 25 18:15 rtables -> /fwsm/lb/project/openifs/ifsdata/rtables lrwxrwxrwx 1 troifs0 ectrain 12 May 25 18:15 sfcwindin -> ../sfcwindin lrwxrwxrwx 1 troifs0 ectrain 13 May 25 18:15 specwavein -> ../specwavein lrwxrwxrwx 1 troifs0 ectrain 10 May 25 18:15 uwavein -> ../uwavein lrwxrwxrwx 1 troifs0 ectrain 18 May 25 18:15 wam_grid_tables -> ../wam_grid_tables -rw-r----- 1 troifs0 ectrain 2220 May 25 18:15 wam_namelist lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 wam_subgrid_0 -> ../wam_subgrid_0 lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 wam_subgrid_1 -> ../wam_subgrid_1 lrwxrwxrwx 1 troifs0 ectrain 16 May 25 18:15 wam_subgrid_2 -> ../wam_subgrid_2
Notice the links back to the parent directory. To save space, the files that do not change between members are kept in the parent directory.
Only the model namelists, fort.4 for the atmosphere, wam_namelist for the wave model, are individual to a forecast ensemble member directory.
Running the forecast experiment
The command 'oifs_run'
is normally used to create and submit the batch job to run the model. This command has a large number of options. For this workshop, this command has been configured with the correct defaults for the T255 resolution seasonal length forecasts.
However, it would need to be run in each forecast member directory, which would be tedious.
The command 'run_all_ens'
has been created to speed up submitting the batch jobs to run all the forecast members with one command.
For our 'ob01' experiment, this command would create all the jobs, correctly configure the namelist and submit the job. There will be 10 jobs, one forecast for each ensemble member:
troifs0@cca-login3:~> run_all_ens -e ob01 -d scratch/ob01 -q
The -q option here ensures the batch jobs are submitted to the batch queue. If -q is not specified then the batch script 'job' is only created in each ensemble directory and not submitted.
This will produce lengthy output like this:
......... Ensemble member: 05 60c60 < NENSFNB=5, --- > NENSFNB=0, ! Ensemble forecast number Running command: oifs_run -e ob01 -x 1 -q Copied /home/ectrain/troifs0/bin/cce-opt/master.exe to current directory Using existing namelist in : fort.4 OpenIFS job created: job1 Submitted job: 8430146.ccapar ......
Setting the ensemble member
The command 'run_all_ens'
does this step, no action is needed. This section explains the use of the ensemble member number by IFS.
In IFS, each ensemble member uses the stochastic physics scheme to generate uncertainty. A random number 'seed' is used by the stochastic scheme to generate a different forecast.
This random number seed is changed by altering the ensemble member value, NENSFNB,
in the model's namelist file, fort.4. Each ensemble member must have a unique number and therefore random number seed, in order to produce a different forecast.
The random seed is also date dependent. This means that for the same date and same ensemble member, the forecast is reproducible. But for the same ensemble member starting from a different date/time, the random seed will be different.
CTYPE="pf", ! the type of forecast: 'pf=perturbed', 'cf=control'. The control (unperturbed) forecast is only used in medium-range forecasts, not in seasonal forecasts. NENSFNB=2, ! the ensemble member number. LSTOPH_SPBS=true, ! enables the stochastic backscatter scheme in the model dynamics. This is only used in medium-range forecasts and will be 'false' for seasonal forecasts. LSPSDT=true, ! enables the stochastic scheme for the physics tendencies.
The only namelist variable that needs changing in these experiments is NENSFNB,
which the run_all_ens
command does for you.
The stochastic backscatter scheme LSTOPH_SPBS should be disabled for OpenIFS 43r3 and beyond as it's use has been deprecated since 40r1. It's impact is very small, less so on the newer cubic grids, and is not recommended.
Create and submit Cray batch job
This step is done by the run_all_ens
command. No user action is needed. This section provide more information on the oifs_run command.
To create the batch job, use the oifs_run
command. This creates a small batch job file ready to submit.
You may need to use the oifs_run
command if the batch job script 'job' in each ensemble directory needs to be recreated.
If you need to rerun the experiment, with no changes, simply resubmit the job with the command: qsub job
, rather than use oifs_run
to recreate the job.
oifs_run -e exptid [-l namelist] [-f fcast len]
The common arguments to use are: -e, -l, -f. Other options are available which can usually be left to their default values.
Note! the -f argument, the length of the forecast can be specified in various units: for example d10 means '10 days'.
Checking the model output
During the course of the model run, files with names like ICMGGob01+00000, ICMGGob01+00012 etc will appear in the ensemble member directory.
These are the model output files. There is 1 per output step. In these experiments the output interval is every 12 hours. It can be changed in the namelist, fort.4 but it's strongly recommended for these workshop, to keep the output interval at 12 hrs as this saves storage space.
The command qstatu
shows that the status of the submitted job. When a job completes it will no longer appear in the batch queue.
After the job has completed, there will be a 'batch log file' in the ensemble directory:
oifs_troifs1.o1481792
Check this file to make sure the model has run correctly. It's usually best to start from the bottom of the file and scroll up.
If you see:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! OPENIFS JOB FAILED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
the model run has failed.
The output from each model run goes into a directory named 'output1'. The '1' here is the run number and can be changed.
If the model fails, there are 2 files to look at in the 'output' directory:
NODE.001_01 : this is the output from the model as it is running (i.e. all the print/write fortran statements). This is a large file, so in case of errors, start from the bottom up!
oifs.log : this is where the model writes any error messages.
When the model completes the forecast successfully, the following files will be found in the output directory. ICMSH are the spectral fields, ICMGG are the gridpoint fields.
ICMGGclim_10u ICMGGclim_ci ICMGGclim_lsp ICMGGclim_sd ICMGGclim_sst ICMGGclim_stl4 ICMGGclim_swvl3 ICMGGclim_tsr ICMSHclim_sp NODE.001_01
ICMGGclim_10v ICMGGclim_cp ICMGGclim_msl ICMGGclim_slhf ICMGGclim_stl1 ICMGGclim_str ICMGGclim_swvl4 ICMGGclim_ttr ICMSHclim_t ifs.stat
ICMGGclim_2d ICMGGclim_e ICMGGclim_nsss ICMGGclim_sshf ICMGGclim_stl2 ICMGGclim_swvl1 ICMGGclim_tcc ICMSHclim_d ICMSHclim_vo oifs.log
ICMGGclim_2t ICMGGclim_ewss ICMGGclim_q ICMGGclim_ssr ICMGGclim_stl3 ICMGGclim_swvl2 ICMGGclim_tp ICMSHclim_lnsp ICMSHclim_z
Rerun the model job
A small file 'job1' (or jobN where N is the run number you used), is created in each of the ensemble member directories; 00, 01, 02, etc.
If for any reason the model fails, once you have determined the problem and corrected it, then the run can be submitted by:
qsub job1
There is no need to rerun the run_all_ens command as this will resubmit ALL the ensemble members again.
Postprocessing: preparing monthly means for plotting with Metview
The next step is to process the output of the successful run and create the monthly mean fields that can be transferred to ICTP for plotting using Metview.
A command, oifs_to_mv,
is available to compute the monthly mean fields ready to be plotted. This command uses a metview script to do the processing.
The command takes a small number of arguments. For example, for an experiment labelled 'ob01', located in ~/scratch/ob01, type this command:
oifs_to_mv -e ob01 -d 2015110100 -i ~/scratch/ob01
The post-processing step can take some time to complete
This command only processes one date at a time. It needs to be run separately for multiple dates in the same experiment id.
Monthly mean files
After the oifs_to_mv command finishes, the monthly mean files can be found in the directory 'mmeans' inside the top level date directory. For example, for experiment id ob01, date 2015110100, the directory is: ob01/2015110100/mmeans.
Inside this directory, there is one file per parameter with 5 monthly averages: Dec, Jan, Feb, March and April, with filenames for example as: oif_ob01_u_20151101_00.grib.
Transfer to ICTP classroom PCs
The monthly mean files can be transferred to ICTP with the sftp command, to be used on the classroom PC with metview.
Useful Unix commands
qsub job
qstatu
rm -rf 2015110100 rm -rf 2015110[1-5]00
du -hs 2015110100
quota
which will produce output similar to the following. The '$SCRATCH' quota is important as this is the total limit for the filesystem /scratch where your experiments are stored.
Quota for $HOME and $PERM: Disk quotas for user troifs0 (uid 16144): Filesystem blocks quota limit grace files quota limit grace cnasa1:/vol/home 226M 480M 500M 132 20000 22000 Disk quotas for user troifs0 (uid 16144): Filesystem blocks quota limit grace files quota limit grace cnasa2:/vol/perm 0 26624M 27648M 1 200k 210k Quota for $SCRATCH ($TEMP) including $SCRATCHDIR ($TMPDIR): Disk quotas for user troifs0 (uid 16144): Filesystem kbytes quota limit grace files quota limit grace /lus/snx11062 767099824 32212254720 32212254720 - 116570 5000000 5000000 - Disk quotas for group ectrain (gid 1400): Filesystem kbytes quota limit grace files quota limit grace /lus/snx11062 767105640 0 0 - 118078 0 0 -