There may be a number of reasons why a submitted job does not start running. When that happens, it is a good idea to use squeue
and pay attention to the STATE
and NODELIST(REASON)
columns:
$> squeue -j 64243399 JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 64243399 my_job user nf PENDING 0:00 03:00:00 1 (null) (Priority)
If the job is in a PENDING state, it means it has not been dispatched to any available node to run. Check the reason why this happens.
Here is a list of the most common ones:
Reason | Descriiption |
---|---|
Priority | Your job is ready to be dispatched, but there are other jobs with more priority which will be dispatched before yours. |
Resources | Your job is ready to be dispatched and it is at the top of the queue, but there are no free resources to satisfy your job requirements. |
AssocMaxJobsLimit | You have reached a limit in the number of jobs you can submit to the system in a given project account. Your job will not be considered until your other jobs in the same project complete. |
QOSMaxJobsPerUserLimit | You have reached a limit in the number of jobs you can submit to a given QoS. Your job will not be considered until your other jobs in the same QoS complete. |
JobArrayTaskLimit | Your job is part of an array job and the job array's limit on the number of simultaneously running tasks has been reached. Your job will not be considered until your other jobs in the same array job complete. |
Dependency | Your job depends on others to complete. Your job will not be considered until dependent jobs complete. |
DependencyNeverSatisfied | Your job has a dependency on another job that will never be satisfied. You should assess why that is and cancel the job as required. |
ReqNodeNotAvail | There are no nodes available to dispatch your job. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status |
Licenses | Your job requires some resources that are temporarily not available. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status |
The full list of reasons can be found in the squeue
man page
man squeue