Job Control
Introduction
In this page you will find some of the Slurm commands related to job control.
List of Commands
SLURM offers a number of helpful commands for tasks ranging from job submission and monitoring to modifying resource requests for jobs that have already been submitted to the queue. Below is a list of SLURM commands.
sbatch
The sbatch command is used for submitting jobs to the cluster. The command sbatch accepts a number of options either from the command line, or (more typically) from a batch script. An example of a SLURM batch script (called simple.slurm) is shown below:
#!/bin/bash
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-00:15:00 # 30 minutes
#SBATCH --output=my.stdout
#SBATCH --mail-user=abac123@case.edu
#SBATCH --mail-type=ALL
#SBATCH --job-name="just_a_test"
# Put commands for executing job below this line
# This example is loading Python 2.7.8 and then writing out the version of Python
module load python
python --version
To submit this batch script, a user would type:
sbatch simple.slurm
This job (called just_a_test) requests 1 compute node, 1 task (by default, SLURM will assign 1 CPU core per task), 1 GB of RAM per CPU core, and 15 minutes of wall time (the time required for the job to complete). Note that these are the defaults for any job, but it is good practice to include these lines in a SLURM script in case you need to request additional resources.
Optionally, any #SBATCH line may be replaced with an equivalent command-line option. For instance, the #SBATCH –ntasks=1 line could be removed and a user could specify this option from the command line using:
sbatch --ntasks=1 simple.slurm
The commands needed to execute a program must be included beneath all #SBATCH commands. Lines beginning with the # symbol (without /bin/bash or SBATCH) are comment lines that are not executed by the shell. The example above simply prints the version of Python loaded in a user’s path. It is good practice to include any setpkgs commands in your SLURM script. A real job would likely do something more complex than the example above, such as read in a Python file for processing by the Python interpreter.
For more information about sbatch see: http://slurm.schedmd.com/sbatch.html
squeue
squeue is used for viewing the status of jobs. By default, squeue will output the following information about currently running jobs and jobs waiting in the queue: Job ID, Partition, Job Name, User Name, Job Status, Run Time, Node Count, and Node List. There are a large number of command-line options available for customizing the information provided by squeue. Below are a list of examples:
For more information about squeue see: http://slurm.schedmd.com/squeue.html
Similar command to "showq" or "qstat"
squeue -u sxg125
output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
661587 batch bash sxg125 R 22:21 1 comp150t
Note the jobID (661587), status of the Job (R-> Running) and the compute node (comp150t) that the job is running.
Want to see details such as why your job is in PD (pending) state, in which node your job is running.
sq
output:
730814 batch slurm.sl bga11 PD 0:00 1 4 6950 (AssocMaxWallDurationPerJobLimi
730815 batch slurm.sl bga11 PD 0:00 1 4 6950 (AssocMaxWallDurationPerJobLimi
..
989833 batch 3DClasse txh310 R 21:59:56 16 240 3044 comp145t,comp146t,comp147t,comp149t,comp151t,comp154t,comp156t,comp157t,comp158t,comp159t,comp179t,comp185t,comp186t,comp187t,comp191t,comp192t
992383 batch job_chec sxl1036 R 1:43:13 2 16 3007 comp122t,comp123t
Also, show the start time and end time of the job:
squeue -u <CaseID> -o "%.9i %.9P %.8j %.8u %.2t %.10M %.6D %S %e"
output:
JOBID PARTITION NAME USER ST TIME NODES START_TIME END_TIME
676101 batch JOB sxg125 PD 0:00 1 2016-04-09T15:25:21
606057 batch JOB sxg125 R 8-01:08:45 1 2016-03-31T14:17:02 2016-04-31T14:17:02
606056 batch JOB sxg125 R 8-01:10:16 1 2016-03-31T14:15:31 2016-03-31T14:15:31
The job 676101 is estimated to start on April 09 at 15:25 and the end time of job 606057 is April 31 at 14:17.
Filtering squeue output through awk may be useful, for example, to isolate entries with group name in common:
squeue -o "%A %C %e %E %g %l %m %N %T %u" | awk 'NR==1 || /eecs600/'
output:
JOBID CPUS END_TIME DEPENDENCY GROUP TIME_LIMIT MIN_MEMORY NODELIST STATE USER
148137 1 2016-01-26T16:54:22 eecs600 2:00:00 1900 comp145t RUNNING aar93
148146 1 2016-01-27T01:14:27 eecs600 10:00:00 1900 comp148t RUNNING hxs356
Note the jobs status for the users in a group eecs600
sacct
This command is used for viewing information for completed jobs. This can be useful for monitoring job progress or diagnosing problems that occurred during job execution. By default, sacct will report Job ID, Job Name, Partition, Account, Allocated CPU Cores, Job State, and Exit Code for all of the current user’s jobs that completed since midnight of the current day. Many options are available for modifying the information output by sacct:
The –format option is particularly useful, as it allows a user to customize output of job usage statistics. We would suggest create an alias for running a customized version of sacct. For instance, the elapsed and Timelimit arguments allow for a comparison of allocated vs. actual wall time. MaxRSS and MaxVMSize shows maximum RAM and virtual memory usage information for a job, respectively, while ReqMem reports the amount of RAM requested.
See the status of your job. Note that your executable should be preceded by "srun" command for both serial and MPI executable.
sacct -o JobID,JobName,AveCPU,AvePages,AveRSS,MaxRSSNode,AveVMSize,NTasks,State,ExitCode -j <jobID>
output:
JobID JobName AveCPU AvePages AveRSS MaxRSSNode AveVMSize NTasks State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- --------
1013605 v2o5band COMPLETED 0:0
1013605.bat+ batch 00:00:00 0 6244K comp162t 308544K 1 COMPLETED 0:0
For more information about sacct see: http://slurm.schedmd.com/sacct.html
scancel
It kills the job.
Example:
scancel -i 681457
prompt:
Cancel job_id=681457 name=bash partition=batch [y/n]? y
srun: Force Terminated job 681457
Cancel all the jobs related to the caseID
scancel -u <caseID>
scontrol
scontrol is used for monitoring and modifying queued jobs. One of its most powerful options is the scontrol show job option. scontrol is also used for holding and releasing jobs. Below is a list of useful scontrol commands:
Example:
scontrol show job 136355
output:
JobId=136355 JobName=xxxxx
UserId=xxxx(yyyy) GroupId=xxx(yyy)
Priority=3007 Nice=0 Account=gray QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=20:07:27 TimeLimit=13-07:00:00 TimeMin=N/A
SubmitTime=2016-01-18T15:37:55 EligibleTime=2016-01-18T15:37:55
StartTime=2016-01-18T15:37:56 EndTime=2016-01-31T22:37:56
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=batch AllocNode:Sid=hpctest:39249
ReqNodeList=(null) ExcNodeList=(null)
NodeList=comp148t
BatchHost=comp148t
NumNodes=1 NumCPUs=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=48G MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/xxxx/AAA.sh
WorkDir=/home/xxxx/BBB
StdErr=/home/xxxx/OOO.o
StdIn=/dev/null
StdOut=/home/xxx/OOOO.o
Power= SICP=0
If the job is pending, it will show the reason for pending as well:
...
JobState=PENDING Reason=ReqNodeNotAvail(Unavailable:gpu017t,gpu018t,gpu019t,gpu020t,gpu021t,gpu022t,gpu023t,gpu024t) Dependency=(null)
Here, it shows that the job is waiting for the resources. The gpu nodes are listed because they are currently offline.
SLURM command for information about a node:
scontrol show node comp009t
output:
NodeName=comp009t Arch=x86_64 CoresPerSocket=1
CPUAlloc=1 CPUErr=0 CPUTot=12 CPULoad=0.96 Features=hex24gb
Gres=(null)
NodeAddr=comp009t NodeHostName=comp009t Version=15.08
OS=Linux RealMemory=23000 AllocMem=16384 Sockets=12 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=100000 Weight=1 Owner=N/A
BootTime=2016-03-02T13:58:01 SlurmdStartTime=2016-03-17T08:26:18
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Here, the number of processors (ncpus) is 12, and available Memory (availmem) is 23000 (~ 23gb).
For more information about scontrol see: http://slurm.schedmd.com/scontrol.html
srun
srun can be used to run interactive jobs, with or without graphics
srun --x11 -N 1 -c 2 --time=1:00:00 --pty /bin/bash
This will launch two tasks on a single node for 1 hour, with graphical windows ready.
This command can also be used to launch a parallel job step. Typically, srun is invoked from a SLURM job script to launch a MPI job (much in the same way that mpirun or mpiexec are used). More details about running MPI jobs within SLURM are provided below. Please note that your application must include MPI code in order to run in parallel across multiple CPU cores using srun. Invoking srun on a non-MPI command or executable will result in this program being independently run X times on each of the CPU cores in the allocation.
Alternatively, srun can be run directly from the command line on a gateway, in which case srun will first create a resource allocation for running the parallel job. The -n [CPU_CORES] option is passed to specify the number of CPU cores for launching the parallel job step. For example, running the following command from the command line will obtain an allocation consisting of 16 CPU cores and then run the command hostname across these cores:
srun -n 16 hostname
For more information about srun see: http://www.schedmd.com/slurmdocs/srun.html
sinfo
sinfo allows users to view information about SLURM nodes and partitions. A partition is a set of nodes (usually a cluster) defined by the cluster administrator. Below are a few example uses of sinfo:
Note: If you want to get the detailed options equivalent to "showq" and "mdiag -n"
si or sinfo -a -o "%P %a %l %D %N %C"
output:
PARTITION AVAIL TIMELIMIT NODES NODELIST CPUS(A/I/O/T)
smp up 13-08:00:00 2 smp04t,smp05t 1/71/0/7
Here, (A/I/O/T) represents "allocated/idle/other (offline/down)/total". The alias for that long command is "si", for total allocation, use the command "sc"
$sc CPUS(A/I/O/T) 318/1166/20/1504 Utilization: 21.1436%
Equivalent to "mdiag -n":
sinfo -p batch -Nle -o '%n %C %t'
or,
siall
output:
NODELIST AVA TIMELIMIT NODE CPUS(A/I/O/T) CPU_LOAD MEMORY FEATURES REASON
comp001t up 13-08:00:0 1 3/9/0/12 1.87 23000 hex24gb none
comp002t up 13-08:00:0 1 9/3/0/12 3.00 23000 hex24gb none
...
Reasons for possible node failure:
sinfo -R
For more information about sinfo see: http://slurm.schedmd.com/sinfo.html
If you want to check your group allocation and the resources used by other members in the group, use the information (i) command:
i
output:
****Your SLURM's CPU Quota****
xxx 256
****Your Current Jobs****
JOBID PRIOR ST ACCOUNT PARTITION NODES CPU MIN_MEMORY TIME_LIMIT NODELIST
1931308 1012 R xxx batch 3 36 72K 5-00:00:00 comp208t,comp209t,comp210t
1935896 1004 R xxx batch 1 12 24K 2-12:00:00 comp186t
1935867 1003 R xxx batch 1 6 12K 2-12:00:00 comp050t
1934798 1003 R xxx batch 1 6 12K 2-12:00:00 comp049t
****Group's Jobs****
Account:yxk
JOBID USER PRIOR ST PARTITION NODES CPU MIN_MEMORY TIME_LIMIT NODELIST
Here, the group can run upto 256 processors. The members in the group have already used 60 processors (36 + 12 + 6 + 6) out of the allocation.
sreport
sreport is used for generating reports of job usage and cluster utilization. It queries the SLURM database to obtain this information. By default information will be shown for jobs run since midnight of the current day. Some examples:
For more information about sreport see: http://slurm.schedmd.com/sreport.html
sstat
Display various status information of a running job/step (Refer to SLURM man page).
sstat -j <jobID>
Very Important: If you are submitting the job using sbatch, please include srun before your executable in your SLURM batch script as showed:
srun ./<executable>
Selecting the fields of interest
sstat -p --format=AveCPU,AvePages,AveRSS,MaxRSSNode,AveVMSize,NTasks,JobID -j 661587
output:
AveCPU|AvePages|AveRSS|MaxRSSNode|AveVMSize|NTasks|JobID|
00:00.000|0|2264K|comp150t|119472K|1|661587.0|
To estimate how much memory is being consumed by run a top command in the node where your job is running.
sq | grep <caseID>
output:
1958082 batch Tumor-PIPE-a <caseID> R 19:40:29 1 4 1002 comp153t
ssh -t comp153t top
output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21348 jxw773 20 0 14.9g 14g 1072 S 400.0 22.6 678:30.15 bwa
Note the 22.6% of 64gb comes out to be about 15gb of memory.
Job Dependency
There is an sbatch switch "--dependency" that will defer running a job until a list of other jobs have completed:
https://slurm.schedmd.com/sbatch.html#OPT_dependency
Slurm Efficiency (seff)
seff <jobID>
output:
Job ID: <jobID>
Cluster: smaster2
User/Group: <userID>/<groupID>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 24
CPU Utilized: 00:00:28
CPU Efficiency: 0.01% of 5-05:34:24 core-walltime
Memory Utilized: 50.90 GB (estimated maximum)
Memory Efficiency: 79.52% of 64.00 GB (64.00 GB/node)