When zombies occur, it is not always easy to reason why they have occurred, and also which machine is giving rise to them
In ecflow 5.0.0 this has been improved. Now when zombies appear in the GUI dialog, it will be accompanied by an explanation.
Additionally, we also show the host where the zombie process is residing
Here is an example of the possible reason why.
- Process id mismatch, password matches. Job scheduled twice. Check submitter
- Both PID and password mismatch. Re-queue & submit an active job?
- Password mismatch, PID matches, system has re-cycled PID or hacked job file?
- Two init commands or task complete or aborted but receives another child command
- Created by user action, In this case, we will actually list the offending user command. [ force | delete | begin | re-queue | execute (i.e. rerunning an already active job), etc ]
- Task not found. Nodes replaced whilst jobs were running