Step-by-step guide
There are several ways to do this.
If the tasks are not very robust and are known to fail regularly then, we can have a custom ERROR trapping, which instead of aborting the task, log the aborts, and sets the task to complete. The logging could be anything
# Defined a error handler ERROR() { echo "ERROR called" set +e # Clear -e flag, so we don't fail wait # wait for background process to stop # Record the failure in the log file for later analysis ecflow_client --msg="ERROR task %ECF_NAME% failed" ecflow_client --complete # replace abort with a complete trap 0 # Remove the trap exit 0 # End the script }
Have a special task whose job is to monitor failure in the other tasks. This task will then log the failures and then automatically set the family/repeat to complete.
suite suite family main repeat date YMD 20170101 20180101 1 task dodgy # this task may fail task ok task fix # handle failures, so repeat will advance even if other tasks fail complete dodgy == complete and ok == complete # If there are no failures, complete fix, so repeat will advance time 23:30 # run at 23:30, allowing users to address task dodgy otherwise automatically advance the REPEAT endfamily endsuite
fix.ecf
# task fix.ecf %include <head.h> trap 0 ecflow_client --force=complete recursive %SUITE%/%FAMILY% exit 0 %include <tail.h>