SLURM: Submitting Jobs

Batch vs Interactive Jobs

The trouble with interactive environments

There is another reason why GUIs are less common in HPC environments: point-and-click is necessarily interactive. In HPC environments (as we'll see in session 3) work is scheduled in order to allow exclusive use of the shared resources. On a busy system there may be several hours wait between when you submit a job and when the resources become available, so a reliance on user interaction is not viable. In Unix, commands need not be run interactively at the prompt, you can write a sequence of commands into a file to be run as a script, either manually (for sequences you find yourself repeating frequently) or by another program such as the batch system.

The job might not start immediately, and might take hours or days, so we prefer a batch approach:

I can now run the script interactively, which is a great way to save effort if I frequently use the same workflow, or ...

Where does the output go?

Writing and Submitting a job

There are two aspects to a batch job script:

A simple example

A typical batch script on an NYU HPC cluster looks something like these:

#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=5:00:00#SBATCH --mem=2GB#SBATCH --job-name=myTest#SBATCH --mail-type=END#SBATCH --mail-user=bob.smith@nyu.edu#SBATCH --output=slurm_%j.out#SBATCH --error=slurm_%j.err
  module purgemodule load stata/17.0RUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}mkdir -p $RUNDIR  DATADIR=$SCRATCH/my_project/datacd $RUNDIRstata -b do $DATADIR/data_0706.do
#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=5:00:00#SBATCH --mem=2GB#SBATCH --job-name=myTest#SBATCH --mail-type=END#SBATCH --mail-user=bob.smith@nyu.edu#SBATCH --output=slurm_%j.out#SBATCH --error=slurm_%j.err module purge  SRCDIR=$HOME/my_project/codeRUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}mkdir -p $RUNDIR  cd $SLURM_SUBMIT_DIRcp my_input_params.inp $RUNDIR  cd $RUNDIRmodule load fftw/intel/3.3.9$SRCDIR/my_exec.exe < my_input_params.inp

We'll work through them more closely in a moment.

You submit the job with sbatch:

$ sbatch myscript.s

And monitor its progress with:

$ squeue -u $USER

What just happened? Here's an annotated version of the first script:

#!/bin/bash# This line tells the shell how to execute this script, and is unrelated# to SLURM.   # at the beginning of the script, lines beginning with "#SBATCH" are read by# SLURM and used to set queueing options. You can comment out a SBATCH# directive with a second leading #, eg:##SBATCH --nodes=1   # we need 1 node, will launch a maximum of one task and use one cpu for the task: #SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --cpus-per-task=1   # we expect the job to finish within 5 hours. If it takes longer than 5# hours, SLURM can kill it:#SBATCH --time=5:00:00   # we expect the job to use no more than 2GB of memory:#SBATCH --mem=2GB   # we want the job to be named "myTest" rather than something generated# from the script name. This will affect the name of the job as reported# by squeue:#SBATCH --job-name=myTest # when the job ends, send me an email at this email address.#SBATCH --mail-type=END#SBATCH --mail-user=bob.smith@nyu.edu   # both standard output and standard error are directed to the same file.# It will be placed in the directory I submitted the job from and will# have a name like slurm_12345.out#SBATCH --output=slurm_%j.out # once the first non-comment, non-SBATCH-directive line is encountered, SLURM# stops looking for SBATCH directives. The remainder of the script is  executed# as a normal Unix shell script  # first we ensure a clean running environment:module purge# and load the module for the software we are using:module load stata/17.0  # next we create a unique directory to run this job in. We will record its# name in the shell variable "RUNDIR", for better readability.# SLURM sets SLURM_JOB_ID to the job id, ${SLURM_JOB_ID/.*} expands to the job# id up to the first '.' We make the run directory in our area under $SCRATCH, because at NYU HPC# $SCRATCH is configured for the disk space and speed required by HPC jobs.RUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}mkdir $RUNDIR  # we will be reading data in from somewhere, so define that too:DATADIR=$SCRATCH/my_project/data  # the script will have started running in $HOME, so we need to move into the# unique directory we just createdcd $RUNDIR  # now start the Stata job:stata -b do $DATADIR/data_0706.doThe second script has the same SBATCH directives, but this time we are using code we compiled ourselves. Starting after the SBATCH directives:# first we ensure a clean running environment:module purge  # and ensure we can find the executable:SRCDIR=$HOME/my_project/code  # create a unique directory to run this job in, as per the script aboveRUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}mkdir $RUNDIR  # By default the script will have started running in the directory we ran sbatch from.# Let's assume our input file is in the same directory in this example. SLURM# sets some environment variables with information about the job, including# SLURM_SUBMIT_DIR which is the directory the job was submitted from. So lets# go there and copy the input file to the run directory on /scratch:cd $SLURM_SUBMIT_DIRcp my_input_params.inp $RUNDIR  # go to the run directory to begin the run:cd $RUNDIR  # load whatever environment modules the executable needs:module load fftw/intel/3.3.9  # run the executable (sending the contents of my_input_params.inp to stdin)$SRCDIR/my_exec.exe < my_input_params.inp

Batch Jobs

Jobs are submitted with the sbatch command:

$ sbatch options job-script

The options tell SLURM information about the job, such as what resources will be needed. These can be specified in the job-script as SBATCH directives, or on the command line as options, or both (in which case the command line options take precedence should the two contradict each other). For each option there is a corresponding SBATCH directive with the syntax:

#SBATCH option

For example, you can specify that a job needs 2 nodes and 4 cores on each node (by default one CPU core per task) on each node by adding to the script the directive:

#!/bin/bash#SBATCH --nodes=2#SBATCH --ntasks-per-node=4

or as a command-line option to sbatch when you submit the job: 

$ sbatch --nodes=2 --ntasks-per-node=4 my_script.s

Options to manage job output

Options to set the job environment

Options to request compute resources

Options for running interactively on the compute nodes with srun

Options for delaying starting a job

Options for running many similar jobs

R Job Example

Create a directory and an example R script

mkdir /scratch/$USER/examplescd /scratch/$USER/examples

Create example.R inside the examples directory:

df <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))dfindices <- order(df$x)order(df$x)df[indices,]df[rev(order(df$y)),]

Create the following SBATCH script:

$ cat run-R.sbatch#!/bin/bash##SBATCH --job-name=RTest#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --mem=2GB#SBATCH --time=01:00:00 module purgemodule load r/intel/4.0.4 cd /scratch/$USER/examplesR --no-save -q -f example.R > example.out 2>&1

Run the job using "sbatch".

$ sbatch run-R.sbatch

Array Jobs

Using job array you may submit many similar jobs with almost identical job requirement. This reduces loads on both shoulders of users and the scheduler system. Job array can only be used in batch jobs. Usually the only requirement difference among jobs in a job array is the input file or files. Please follow the recipe below to try the example. There are 5 input files named 'sample-1.txt', 'sample-2.txt' to 'sample-5.txt' in sequential order. Running one command "sbatch --array=1-5 run-jobarray.s", you submit 5 jobs to process each of these input files individually.

$ mkdir -p /scratch/$USER/myjarraytest$ cd /scratch/$USER/myjarraytest$ cp /share/apps/Tutorials/slurm/example/jobarray/* .$ lsrun-jobarray.s  sample-1.txt  sample-2.txt  sample-3.txt  sample-4.txt  sample-5.txt  wordcount.py$ sbatch --array=1-5 run-jobarray.sSubmitted batch job 23240

The content of the job script 'run-jobarray.s' is copied below:

#!/bin/bash##SBATCH --job-name=myJobarrayTest#SBATCH --nodes=1 #SBATCH --ntasks-per-node=1#SBATCH --time=5:00#SBATCH --mem=1GB#SBATCH --output=wordcounts_%A_%a.out#SBATCH --error=wordcounts_%A_%a.err
module purgemodule load python/intel/3.8.6
cd /scratch/$USER/myjarraytestpython wordcount.py sample-$SLURM_ARRAY_TASK_ID.txt

Job array submission induces an environment variable SLURM_ARRAY_TASK_ID, which is unique for each job array job. It is usually embedded somewhere so that at a job running time its unique value is incorporated into producing a proper file name. Also as shown above: two additional options %A and %a, denoting the job ID and the task ID (i.e. job array index) respectively, are available for specifying a job's stdout, and stderr file names.

More examples


You can find more examples here 
/scratch/work/public/examples/slurm/jobarry/

GPU Jobs

To request one GPU card, use SBATCH directives in job script:

#SBATCH --gres=gpu:1

To request a specific card type, use e. g. --gres=gpu:v100:1.  As an example, let's submit an Amber job. Amber is a molecular dynamics software package. The recipe is:

$ mkdir -p /scratch/$USER/myambertest$ cd /scratch/$USER/myambertest$ cp /share/apps/Tutorials/slurm/example/amberGPU/* .$ sbatch run-amber.sSubmitted batch job 14257 

There are three NVIDIA GPU types and one AMD GPU type that can be used (CAUTION: AMD GPUs require code to be compatible with ROCM drivers, not CUDA).

To Request NVIDIA GPUs

#SBATCH --gres=gpu:rtx8000:1
#SBATCH --gres=gpu:v100:1
#SBATCH --gres=gpu:a100:1

To Request AMD GPUs

#SBATCH --gres=gpu:mi50:1 

From the tutorial example directory we copy over Amber input data files "inpcrd", "prmtop" and "mdin", and the job script file "run-amber.s". The content of the job script "run-amber.s" is:

#!/bin/bash##SBATCH --job-name=myAmberJobGPU#SBATCH --nodes=1#SBATCH --cpus-per-task=1#SBATCH --time=00:30:00#SBATCH --mem=3GB#SBATCH --gres=gpu:1module purgemodule load amber/openmpi/intel/20.06cd /scratch/$USER/myambertestpmemd.cuda -O

The demo Amber job should take ~2 minutes to finish once it starts running. When the job is done, several output files are generated. Check the one named "mdout", which has a section most relevant here:

|--------------------- INFORMATION ---------------------- | GPU (CUDA) Version of PMEMD in use: NVIDIA GPU IN USE. |                    Version 16.0.0 |                      02/25/2016 [......]  
|------------------- GPU DEVICE INFO -------------------- |            CUDA_VISIBLE_DEVICES: 0 |   CUDA Capable Devices Detected:      1 |           CUDA Device ID in use:      0 |                CUDA Device Name: Tesla K80 |     CUDA Device Global Mem Size:  11439 MB | CUDA Device Num Multiprocessors:     13 |           CUDA Device Core Freq:   0.82 GHz |--------------------------------------------------------

Interactive Jobs

Bash Sessions

The majority of the jobs on the NYU HPC cluster are submitted with the sbatch command, and executed in the background. These jobs' steps and workflows are predefined by users, and their executions are driven by the scheduler system.

There are cases when users need to run applications interactively (interactive jobs). Interactive jobs allow the users to enter commands and data on the command line (or in a graphical interface), providing an experience similar to working on a desktop or laptop. Examples of common interactive tasks are:

To support interactive use in a batch environment, Slurm allows for interactive batch jobs.

Can you run interactive jobs on the HPC Login nodes?

Since the login nodes of the HPC cluster are shared between many users, running interactive jobs that require significant computing and IO resources on the login nodes will impact many users.

Thus running compute and IO intensive interactive jobs on the HPC login nodes is not allowed. Such jobs may be removed without notice!

Instead of running interactive jobs on Login nodes, users can run interactive jobs on the HPC Compute nodes using SLURM's srun utility. Running interactive jobs on compute nodes does not impact many users and in addition provides access to resources that are not available on the login nodes, such as interactive access to GPUs, high memory, exclusive access to all the resources of a compute node, etc.  Note: There is no partition on the HPC cluster that has been reserved for Interactive jobs.

Start an Interactive Job

When you start an interactive batch job the command prompt is not immediately returned. Instead, you wait until the resource is available when the prompt is returned and you are on a compute node and in a batch job - much like the process of logging in to a host with ssh. To end the session, type 'exit', again just like the process of logging in and out with ssh.

[wd35@log-0 ~]$ srun --pty /bin/bash[wd35@c17-01 ~]$ hostnamec17-01

To use any GUI-based program within the interactive batch session you will need to extend X forwarding with the --x11 option. This of course still relies on you having X forwarding at your login session - try running 'xterm' before starting the interactive to verify that this is working correctly.

Request Resources

You can request resources for an interactive batch session just as you would for any other job, for example to request 4 processors with 4GB memory for 2 hours:

If you do not request resources you will get the default settings. If after some directory navigation in your interactive session, you can jump back to the directory you submitted from with:

$ cd $SLURM_SUBMIT_DIR

Interactive Job Options

(Don't just submit the job, but also wait for it to start and connect stdout, stderr and stdin to the current terminal)

Certain tasks need user interaction - such as debugging and some GUI-based applications. However the HPC clusters rely on batch job scheduling to efficiently allocate resources. Interactive batch jobs allow these apparently conflicting requirements to be met.

Interactive Bash Job Examples

Example (Without x11 forwarding)

Through srun SLURM provides rich command line options for users to request resources from the cluster, to allow interactive jobs. Please see some examples and short accompanying explanations in the code block below, which should cover many of the use cases.

# In the srun examples below, through "--pty /bin/bash" we request to start bash command shell session in pseudo terminal# by default the resource allocated is single CPU core and 2GB memory for 1 hour$ srun --pty /bin/bash # To request 4 CPU cores, 4 GB memory, and 2 hour running duration$ srun -c4 -t2:00:00 --mem=4000 --pty /bin/bash  # To request one GPU card, 3 GB memory, and 1.5 hour running duration$ srun -t1:30:00 --mem=3000 --gres=gpu:1 --pty /bin/bash

Example (x11 forwarding)

In srun there is an option "–x11", which enables X forwarding, so programs using a GUI can be used during an interactive session (provided you have X forwarding to your workstation set up).

# To request computing resources, and export x11 display on allocated node(s)$ srun --x11 -c4 -t2:00:00 --mem=4000 --pty /bin/bash$ xterm  # check if xterm popping up okay  # To request GPU card etc, and export x11 display$ srun --x11 -t1:30:00 --mem=3000 --gres=gpu:1 --pty /bin/bash
$ srun -n4 -t2:00:00 --mem=4000 --pty /bin/bash

R interactive Job

The following example shows how to work with Interactive R session on a compute node

[NetID@log-1 ~]$ srun -c 1 --pty /bin/bash
[NetID@c17-01 ~]$ module purge
[NetID@c17-01 ~]$ module list

No modules loaded
[NetID@c17-01 ~]$ module load r/intel/4.0.4
[NetID@c17-01 ~]$ module list
Currently Loaded Modules:
  1) intel/19.1.2   2) r/intel/4.0.4

[NetID@c17-01 ~]$ R
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical ComputingPlatform: x86_64-centos-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.
  Natural language support but running in an English locale
R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.
> 5 + 10
[1] 15
> 6 ** 2
[1] 36
> tan(45)
[1] 1.619775
>
> q()
Save workspace image? [y/n/c]: n
[NetID@c17-01 ~]$ exit
exit
[NetID@log-1 ~]$