Job Arrays
What is a Job Array?
SLURM's sbatch job arrays are duplicates of the same job. This allows a single job to contain many sub-jobs, which has several benefits.
It allows for ease of performance the same calculations on different data sets.
A single script to do the same thing to different datasets. This is easy to change, maintain, and keep track of.
Using a Job Array
If some part of this documentation uses SLURM and sbatch commands and terminology unfamiliar to you, it is recommended to go read SLURM Parallelism.
Also refer to Multiple Jobs on One Compute Node, for related information (simpler example).
Basics
The script that submits the job is called the job, and the individual array copy is call the task. Let's start with a basic job array sbatch script, basic.sh:
#!/bin/bash
#SBATCH --job-name myJobArray # This job is named myJobArray.
#SBATCH --partition medium # This job will go in the medium partition.
#SBATCH --nodes 1 # This job will be run on one node and will be pending (PD) until available.
#SBATCH --ntasks-per-node 1 # This job will allocate one SLURM task on each allocated node.
#SBATCH --time 01:00 # This job will run for one minute.
#
# -------------------------------------------------------------------------------------
#SBATCH --output myJobArray_%A_%a.txt # The normal output will be put into this file.
#SBATCH --error myJobArray_%A_%a.err # The error output will be put into this file.
#SBATCH --array=0-9
echo "Job ID: $SLURM_ARRAY_JOB_ID"
echo "Task ID: $SLURM_ARRAY_TASK_ID"
Everything below the line is an example of how to use a job array, and we will be focusing on those. Let's break these down.
%A in the first two lines means the job's ID.
%a in the first two lines means the task's ID. This will be evaluated per task that runs, and each task will create it's own output file.
The third line creates a job array with 10 array tasks.
For job arrays, array tasks are individual elements of the array. Each array task will require their own copies of the specified SLURM tasks, nodes, and CPUs.
The first echo prints out the job ID.
The second echo prints out the array task ID.
After running this script, this is the working directory:
ls
basic.sh myJobArray_5341902_2.err myJobArray_5341902_4.txt myJobArray_5341902_7.err myJobArray_5341902_9.txt
myJobArray_5341902_0.err myJobArray_5341902_2.txt myJobArray_5341902_5.err myJobArray_5341902_7.txt
myJobArray_5341902_0.txt myJobArray_5341902_3.err myJobArray_5341902_5.txt myJobArray_5341902_8.err
myJobArray_5341902_1.err myJobArray_5341902_3.txt myJobArray_5341902_6.err myJobArray_5341902_8.txt
myJobArray_5341902_1.txt myJobArray_5341902_4.err myJobArray_5341902_6.txt myJobArray_5341902_9.err
basic.sh had a job ID of 5341902, and there are two files for every array task, an output file and an error file. Let's review the contents of each task's output.
cat myJobArray_5341902_0.txt
Job ID: 5341902
Task ID: 0
cat myJobArray_5341902_1.txt
Job ID: 5341902
Task ID: 1
cat myJobArray_5341902_2.txt
Job ID: 5341902
Task ID: 2
...
cat myJobArray_5341902_9.txt
Job ID: 5341902
Task ID: 9
Each file output the the job ID (which matches the %A value in the file name), and the array task ID (which matches the %a value in the file name).
Squeue and Job Arrays
Let's add "sleep 60" to our basic.sh, telling each array task to sleep for 60 seconds. Then let's resubmit it and review the SLURM queue.
sbatch basic.sh
squeue -u sw23
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5341912_[8-9] medium myJobArr sw23 PD 0:00 2 (Resources)
5341912_0 medium myJobArr sw23 R 0:07 2 compute[016-017]
5341912_1 medium myJobArr sw23 R 0:07 2 compute[018-019]
5341912_2 medium myJobArr sw23 R 0:07 2 compute[020-021]
5341912_3 medium myJobArr sw23 R 0:07 2 compute[022-023]
5341912_4 medium myJobArr sw23 R 0:07 2 compute[024-025]
5341912_5 medium myJobArr sw23 R 0:07 2 compute[026-027]
5341912_6 medium myJobArr sw23 R 0:07 2 compute[028-029]
5341912_7 medium myJobArr sw23 R 0:07 2 compute[030-031]
This shows that the job ID is 5341912 and array tasks 0-7 are running, while array tasks 8-9 are pending, waiting for resources. This also shows that each task does run on two nodes.
To cancel all the array tasks, sw23 would use scancel 5341912; to cancel a specific array task, like task ID 0, sw23 would use scancel 5341912_0.
A More Efficient Example
Since each array task knows its own ID, then you can use this to have each array task check what it's ID is and then run specific parameters accordingly. An example of this logic is below.
#!/bin/bash
#SBATCH --job-name efficientExample # This job is named efficientExample.
#SBATCH --partition medium # This job will go in the medium partition.
#SBATCH --ntasks 1 # This job will allocate one task to itself.
#SBATCH --output efficient_%A_%a.txt # The normal output will be put into this file.
#SBATCH --error efficient_%A_%a.err # The error output will be put into this file.
#SBATCH --array=0-9
PARAMS=(1.0 1.1 1.2 1.3 1.4 2.0 2.1 2.2 2.3 2.4)
./a.out ${PARAMS[$SLURM_ARRAY_TASK_ID]}
This script will allocate ten SLURM tasks and initialize $PARAMS to an array with ten data points. Each array task will be allocated a single SLURM task, and then each will fill their single SLURM task by starting the process "a.out" with data input from the matching index of $PARAMS. The output of each array task will be "efficient_[job ID]_[array task ID].txt".
If there is ten cores for the ten SLURM tasks (because a SLURM task defaults to one core), then each calculation will be done in parallel. If there is not enough cores for each SLURM task, then there will be a waitlist for the remaining array tasks, and each will be started as the hardware becomes available.
A More Complex and Inefficient Example
This version could be helpful if a main control file is needed, but otherwise, reference the previous example, it is more efficient and usable.
#!/bin/bash
#SBATCH --job-name inefficientExample # This job is named inefficientExample.
#SBATCH --partition medium # This job will go in the medium partition.
#SBATCH --ntasks 5 # This job will allocate five tasks to itself.
#SBATCH --nodes 1 # This job will be run on one node and will be pending (PD) until available.
#SBATCH --ntasks-per-node 1 # This job will allocate one SLURM task on each allocated node.
#SBATCH --output inefficient_%A_%a.txt # The normal output will be put into this file.
#SBATCH --error inefficient_%A_%a.err # The error output will be put into this file.
#SBATCH --array=1-2
if [ $SLURM_ARRAY_TASK_ID == 1 ]
then
PARAMS = (1.0 1.1 1.2 1.3 1.4)
elif [ $SLURM_ARRAY_TASK_ID == 2 ]
then
PARAMS = (2.0 2.1 2.2 2.3 2.4)
fi
for param in "${PARAMS[@]}"
do
./a.out $param &
done
This shows where if the array task ID is 1, it uses the first set of parameters, and if the array task ID is 2, it uses a second set of parameters. Each array task will use all five available SLURM tasks to run a calculation (a.out) on each of the five elements in $PARAMS. Since this script specifies five SLURM tasks on a single node, these two array tasks act in parallel, as long as there are two nodes with five SLURM tasks available to allocate. If these criteria is not available simultaneously, then the second array task will wait until there is. Alternatively, you could have each array task load a different data file to run calculations on.
Miscellaneous Other Factoids about Job Arrays
More on Array Tasks
There are other operations available to tasks, such as specifying the array task ID more granularly.
If you wanted to specify each array task ID, you could do so with the below:
#SBATCH --array 1,2,6,8,13,52
If you wanted to specify some interval of a set, you could use something like the below. The below would have four task IDs: 1, 3, 5, and 7.
#SBATCH --array 1-7:2
If you want to bottleneck yourself, you could limit how many array tasks are running at a time with the below, which has 100 array tasks created but only five can run at a time.
#SBATCH --array 1-100%5
Job Array's Available Variables
Here is a list of the available variables created by using a job array. Each of these variables is available to each task.
SLURM_ARRAY_JOB_ID: This will be the ID of the job that spawns the array tasks.
SLURM_ARRAY_TASK_ID: This is the ID of the individual array task.
SLURM_ARRAY_TASK_COUNT: This is the number of array tasks created by the job.
SLURM_ARRAY_TASK_MAX: This is the highest array task ID.
SLURM_ARRAY_TASK_MIN: This is the lowest array task ID.
How would I specify which nodes to use?
Use the below to specify the nodes all the array tasks can use. This will limit the available nodes to use to compute001, compute002, and compute003. If there is insufficient hardware, then a waitlist of array tasks will be used like normal.
#SBATCH --nodelist compute[001-003]