Setting the environment is done calling
module load ecflow/5new
It is possible to setup a specific version with
module unload ecflow; module load ecflow/5.5.0
Server can be started with
ecflow_start.sh
Client command can be called to get the self-contained documentation
ecflow_client --help
and the graphical interface is started with
# line below shall add localhost as part of # the ecflowview->Servers list grep localhost $HOME/.ecflowrc/servers ||echo\ "localhost $(uname -n) $((1500 + $(id -g)))"\ >> $HOME/ecflowrc/servers # start the GUI: ecflow_ui
Server administrator directory is $HOME/ecflow_server/ which will contain the server log file, the check point file (binary snapshot of the server content). It is defined as variable ECF_HOME on the top node.
Once the server and its GUI are started, click on Servers->localhost to connect the first time.
Next step is to load a suite into the server. The following python script can be used for suite definition, to expand the suite into a file, and to load it into the server, as shown in the ecflowview snapshot below
Clicking on each task, we can check the presence of task wrapper script (ECF_FILES defined properly), then Edit it, preprocess it (ECF_INCLUDE defined properly, no micro character on its own), and submit it, as a task or as an alias.
Consider Options->CloseOnApply/Submit when multiple aliases must be sent from the same task in a short time.
If the task does not reach the active status, then check that ECF_OUT directory exists on the remote host, check that expected rsh or ssh connection does not request password , or query the queuing system while the directive may not be valid (user account, queue), yet.
When the submit family is working for the expected remote host(s), time to fill the main family with relevant tasks. Enjoy!
(Expanded) Definition file:
It consists of the following keywords for nodes and their attributes definition:
- suite family task endsuite endfamily endtask
- autocancel automigrate autorestore,clock complete cron date day defstatus edit event extern inlimit label late limit meter repeat time today trigger
- on a line, text beyond # is a comment
Comparing with SMS text owner are left behind.
autocancel
for a node to be deleted automatically
autocancel +01:00 # cancel one hour after complete
autocancel 10 # cancel 10 days after complete
autocancel 0 # cancel immediately after being complete
clock
clock real # hybrid may be used in test mode
complete
for a node, to be recursively forced complete from a condition
complete t1:1 or t1==complete
cron
to run a task regularly, task is requeued as soon as complete is received
ie no trigger on the parent task complete shall be used
task can only become complete, thanks to inherited defstatus or complete attribute
cron 23:00 # at next 23:00
cron 10:00 20:00 01:00 # every hour from 10am to 8pm
date
date 25.12.2012
date 01.*.*
day
day monday # sunday,monday,tuesday,wednesday,thursday,friday,saturday
defstatus
defstatus complete #unknown,suspended,queued,submitted,active,aborted
edit
to attach a variable definition to a node
edit variable value
# variables to be find/and/replaced in a task wrapper
edit COMMAND "echo OK" # %COMMAND:sleep 1%
edit TRIGGER "t1==complete" # ecflow_client --wait="%TRIGGER:1==1%"
event
event 1 # may fit a call in task.ecf to 'ecflow_client --event=1'
event ready # ecflow_client --event=ready
extern
extern /path/to/a/external/node # to allow path's use in trigger/complete
inlimit
register the node and its kids to a limit
inlimit /limits:hpc
inlimit /suite/limits:hpc
inlimit /suite/limits:hpc 10
label
label name "default message" # task.ecf: ecflow_client name "label update"
late
late -s +00:15 -a 20:00 -c +02:00
limit
limit hpc 500
meter
meter name -1 100 90 # 90 is threshold (optional) # task.ecf: ecflow_client --meter=name 30
repeat
repeat is incremented when all nodes below are complete
an aborted task DOES prevent repeat to increment
an Operator/Analsyst/dedicated task can help carry on
repeat day step [ENDDATE] # only for suites
repeat integer VARIABLE start end [step]
repeat enumerated VARIABLE first [second [third ...]]
repeat string VARIABLE str1 [str2 ...]
repeat date VARIABLE yyyymmdd yyyymmdd [delta]
time
task become complete ONLY when time range is over
better not to use such task in a trigger expression
time 23:00 # at next 23:00
time 10:00 20:00 01:00 # every hour from 10am to 8pm
time +00:01 # one minute after the begin suite
time +00:10 01:00 00:05 # 10-60 min after begin every 5 min
today
with such attribute, task will start straight when loaded/replaced after given time
while time attribute would make it wait the next day
today 3:00 # today at 3:00
today 10:00 20:00 01:00 # every hour from 10am to 8pm
trigger
for a task to wait the right condition (step/meter/status/variable(int)) to start
Py-Def
As soon as a definition file is beyond few hundred lines, or even before, when obvious repeated patterns are used, a language like Python shall be used. At the Centre, a python module is used for both research and operation to reduce verbosity in suite definition (/home/ma/emos/def/o/def/ecf.py)
#!/usr/bin/env python import sys, pwd; sys.path.append('/home/ma/emos/def/o/def') # ipython # import ecf; help(ecf.<tab>) from ecf import * defs = Defs() def fill(): # functions can generate tasks/families return [Task("t%02d" % i).add(Event(1), Meter("step", -1, 100), Label("info", ""), ) for i in xrange(1, 10+1)] # LIST COMPREHENSION home = os.getenv("HOME") + "/ecflow_server" top = Suite("test").add( Edit(ECF_HOME= home, # where job, local .out go ECF_FILES= home + "/files", ### where .ecf are found ECF_INCLUDE= home + "/include", ### where .h are found ECF_OUT= home + "out", # output remote/local location, create missing directories... ), Family("fam").add( Task("t00").add( Trigger("t01==complete"), Complete("t02:1 or t02==complete"),), fill(), )) if __name__ == '__main__': uid = pwd.getpwnam(pwd.getpwuid( os.getuid() )[ 0 ]).pw_uid host = "localhost" client = Client(host,1500+ui) path = "/test" defs.add_suite(top) client.replace(path, defs)
Task-Wrapper (ecf-file) and header-files
#!/usr/bin/env ksh %manual DESCRIPTION: ... input(s): ... output(s): ... OPERATORS: ... ANALYST: ... %end %comment # ... %end %include <qsub.h> %include <head.h> # main section %COMMAND:printenv% # a variable may contain a command ecflow_client --wait="%TRIGGER:1==1%" # or a embedded blocking/trigger condition ecflow_client --event=1 ecflow_client --meter=step 30 ecflow_client --label="updating" %nopp # no preprocessing, here %end # a directive to include a file without preprocessing cat > test.pl <<\EOF %includenopp <test.pl> EOF %ecfmicro @ # from now _at_ is the micro character for directives and variables # ... # and revert to percent: @ecfmicro % %include <tail.h>
#!/bin/ksh # Defines the variables that are needed for any communication with ECF export ECF_PORT=%ECF_PORT% # The server port number export ECF_HOST=%ECF_HOST% # The name of ecf host that issued this task export ECF_NAME=%ECF_NAME% # The name of this current task export ECF_PASS=%ECF_PASS% # A unique password, ... export ECF_RID=$$ # record the process id. Used for zombie detection # set as FREE on the server with menu # ecFlowUI=>Special=>FreePassword, to accept communication # with a "zombie" with invalid pass set -eux; export PATH=/usr/local/apps/ecflow/%ECF_VERSION%/bin:$PATH ERROR() { set +e; wait; ecflow_client --abort=trap; trap 0; exit 0 } trap '{ ERROR ; }' 0 1 2 3 4 5 6 7 8 10 12 13 15 ecflow_client --init=$$
wait; ecflow_client --complete; trap 0; exit 0
python /home/ma/emos/def/o/def/cray.py