Job Scheduling

Job Submission Essentials

This section is meant to communicate the bare minimum knowledge required to submit jobs to the cluster in an informed way.  Jobs are submitted to the cluster either interactively using salloc, or in batch mode (non-interactively) using sbatch, You should review the sections HPC Batch and Interactive Jobs, the documentation for sbatch and salloc, or reach out the HPC staff if you need help determining how best to proceed.  


Interactive Job with salloc

The salloc command requests a resource allocation from the controller in real time and it returns when the resources have been allocated. To request an interactive job on 1 node with 10 CPUs and 50GB of memory, the command would like like the following:

salloc -c 10 --mem=50g srun --pty /bin/bash

Note that the srun command that is included makes use  of the resources by launching a bash shell on the allocated compute node as soon as the resources become available.


Batch Job with sbatch

The sbatch command requests resources and runs a bash script when those resource become available. You can run multiple sbatch commands to "queue" jobs and they will wait until resource are available to run. This makes them a very efficient option for working as you will not need to wait online for the resources. The same as above, requesting 10 CPUs and 50GB of memory would be done by authoring a job file, e.g. my_job.sh:

#!/bin/bash

#SBATCH -c 10

#SBATCH --mem=50g


# commands here, e.g.:

sleep 10

Note that the resource request options have been moved into the job script and appear in the "#SBATCH" lines. Then the job is submitted with sbatch:


sbatch my_job.sh

Resource Options for salloc and sbatch

Job Scheduling

Important Tips:

Avoid Running Jobs on the Login Nodes

Please DO NOT use the login-node (e.g. hpc1 or hpc2)  for running your jobs. Always use the "sbatch" command to run your jobs. If you are using interactive job submission -- running graphics (e.g. MATLAB), scripts, and other STDIO -- use the command "srun --x11 --pty bash" which assigns you a compute node. Jobs running on the login-node will be killed. If you have already run your job, cancel it using command "kill <PID>". You can get PID by running command "top" at login-node. For killing all processes use:

kill -9 `ps -ef | grep <caseID> | grep -v grep | awk '{print $2}'`

Job Locations

Node Partitions