The task wrapper file does not normally need many changes, if the task designer sticks to the KISS principle, focusing on the functional aspect of the task.
- In some situation, it might be just enough to define SMS variables as reference to ecFlow variables, on the relevant node (top server node, or suite node, or family node)
- a variable may refer to another
vars="SMSRID SMSTRYNO SMSNAME SMSSCRIPT SMSJOB SMSJOBOUT SMSDATE SMSTIME SMSCLOCK SMSKILLCMD SMSURLCMD SMSURLBASE SMSURL SMSPASS SMSNODESMSCMD SMSKILL SMSKILLCMD SMSCHECK SMSCHECKOLD SMSSTATUSCMD SMSCHECKCMD SMSOUT SMSTRIES" for var in $vars; do case $var in SMSCMD) ecf=ECF_JOB;; SMSKILL*) ecf=ECF_KILL_CMD;; SMSSTATUS*) ecf=ECF_STATUS_CMD;; SMSCHECK_CMD*) ecf=ECF_CHECK_CMD;; SMSURL*CMD*) ecf=ECF_URL_CMD;; *) ecf=$(echo $var | sed -e 's:SMS:ECF_:');; esac node=/ # node=path_to_suite_or_family add=add # add=change ecflow_client --alter $add variable $var "%$ecf%" $node done
The file name is changed, ending with .ecf instead of .sms.
simply copy or link the original file from .sms into .ecf
alternatively, define a variable ECF_EXTN in the definition file:: edit ECF_EXTN .sms
This requests that the ecFlow server uses .sms wrappers as the task template. In some cases, no files will need translation (no SMS variables, no CDP calls)
smsmicro is replaced with ecf_micro, when needed
SMS
ecFlow
location
SMSMICRO
ECF_MICRO
definition file
%smsmicro
%ecf_micro
script .ecf .h
In ECMWF Operations, in the main branch, amongst 1394 files, only 43 use SMS system variables, i.e. variables whose name starts with SMS. Among all the suites MetApps is in charge of, amongst 3738 files, 216 are affected. Extracting these variables, we have:
============ %SMS in .sms ============ SMSCHECK SMSCHECKOLD SMSDATE SMSFILES SMSHOME SMSHOST SMSINCLUDE SMSJOBOUT SMSLOG SMSNAME SMSNODE SMSTRYNO SMSURLBASE SMS_PROG ============
Similarly, we can identify all scripts that call the CDP text client.
It is a good design principle to create tasks that are independent of SMS system variables. Only the tasks in charge of “advanced use” are concerned: SMSTRYNO was used to make a job aware of its instance number, enabling verbose output in case of rerun.
One step translation consists of running the scripts through a filter that can be used for both expanded SMS definition files or for task wrappers:
> sed -f sms2ecf-min.sed X.sms > X.ecf
#!/bin/sed -f /^ *action */d /^ *edit ECF_DATE */d s:SMSNAME:ECF_NAME:g s:SMSNODE:ECF_NODE:g s:SMSPASS:ECF_PASS:g s:SMS_PROG:ECF_PORT:g s:SMSINCLUDE:ECF_INCLUDE:g s:SMSFILES:ECF_FILES:g s:SMSTRYNO:ECF_TRYNO:g s:SMSTRIES:ECF_TRIES:g s:SMSHOME:ECF_HOME:g s:SMSRID:ECF_RID:g s:SMSJOB:ECF_JOB:g s:SMSJOBOUT:ECF_JOBOUT:g s:SMSOUT:ECF_OUT:g s:SMSCHECKOLD:ECF_CHECKOLD:g s:SMSCHECK:ECF_CHECK:g s:SMSLOG:ECF_LOG:g s:SMSLISTS:ECF_LISTS:g s:SMSPASSWD:ECF_PASSWD:g s:SMSSERVERS:ECF_SERVERS:g s:SMSMICRO:ECF_MICRO:g s:SMSPID:ECF_PID:g s:SMSHOST:ECF_HOST:g s:SMSDATE:ECF_DATE:g s:SMSURL:ECF_URL:g s:SMSURLBASE:ECF_URLBASE:g s:SMSCMD:ECF_JOB_CMD:g s:SMSKILL:ECF_KILL_CMD:g s:SMSSTATUSCMD:ECF_STATUS_CMD:g s:SMSURLCMD:ECF_URL_CMD:g s:SMSWEBACCESS:ECF_WEBACCESS:g s:SMS_VERS:ECF_VERS:g s:SMS_VERSION:ECF_VERSION:g /edit ECF_INCLUDE/ { s:/include:/include_ecf:g } /edit ECF_INCLUDE/ { s:_prod:_prod_ecf:g } /edit ECF_FILES/ { s:_prod:_prod_ecf:g } s:smshostfile:ecf_hostfile:g s:sms_hosts:ecf_hosts:g
Applying such a filter to all sms tasks can be simplfied:
#!/bin/ksh files=`find -type f -name "*.sms" ` ## all sms wrappers for f in $files ; do ecf=$(basename $f .sms).ecf ## ecf task name sed -f sms2ecf-min.sed $f > $ecf ## translate diff $f $ecf > /dev/null && rm $ecf && ln -sf $f $g ## or link done
SMS wrappers links can be preserved:
#!/bin/ksh files=`find -type l -name "*.sms" ` for f in $files ; do ecf=$(basename $f .sms).ecf ## ecf task name link=$(readlink $f) dir=$(dirname $f); cd $dir ln -sf $link $ecf cd - done
Special attention is needed for the variables renaming:
SMS
ecFlow
SMSCMD
ECF_JOB_CMD
SMSKILL
ECF_KILL_CMD
SMS_STATUSCMD
ECF_STATUS_CMD
SMS_URLCMD
ECF_URL_CMD
It is not a good idea to systematically replace SMS with ECF_, for example, we use the variables NO_SMS and LSMSSIG which are not related to SMS.
If we want to run the the same job using both SMS and ecFlow, %SMSXXX% may be replaced with shell variables ECF_XXX. Then in a header file, we will define ECF_XXX=%SMSXXX:0% for sms mode and ECF_XXX=%ECF_XXX:0% for ecFlow mode.
All tasks calling CDP directly must be treated carefully and text client commands replaced with their ecFlow counterpart. They may force complete a family or a task, requeue a job or change a variable value:
#!/usr/bin/env cdp cdp << EOF define ERROR { if(rc==0) then exit 1; endif } set SMS_PROG %SMS_PROG% login %SMSNODE% %USER% 1 ; ERROR suites -s %SUITE% loop task ( $missing ) do force -r complete /%SUITE%/%FAMILY%/tc\$task ; ERROR endloop exit EOF
The ECF_PORT variable gives us the ability to discriminate between jobs under ecFlow control or not:
#!/bin/ksh if [ %ECF_PORT:0% -gt 0 ] ; then for task in $missing; do ecflow_client --force complete recursive /%SUITE%/%FAMILY%/tc$task done else cdp << EOF define ERROR { if(rc==0) then exit 1; endif } set SMS_PROG %SMS_PROG% login %SMSNODE% %USER% 1 ; ERROR suites -s %SUITE% loop task ( $missing ) do force -r complete /%SUITE%/%FAMILY%/tc\$task ; ERROR endloop exit EOF fi
sms child commands may also be called in few sms task wrappers. These should again be replaced with their ecFlow equivalents.
There is no right way to do this. It is simple to design a task whose language is pure python or pure perl. We tend to use ksh scripting for task templates for the following reasons:
- trap ERROR 0: to prevent early exit from the script and call the ERROR if exited
- set -e: to raise an error if a command exit status is not 0
- set -u: to prevent undefined variable usage
- set -x: to display each command before execution
- PS4 variable: to allow time stamping and evaluate each lines runtime
- trap: to redirect internal/external signal reception to an ERROR function
Task headers can be used to make common what can be shared among multiple tasks (head.h, tail.h, trap.h, rcp.h, qsub.h).