Job Arrays

What is a Job Array?

SLURM's sbatch job arrays are duplicates of the same job. This allows a single job to contain many sub-jobs, which has several benefits.

Using a Job Array

If some part of this documentation uses SLURM and sbatch commands and terminology unfamiliar to you, it is recommended to go read SLURM Parallelism.

Also refer to Multiple Jobs on One Compute Node, for related information (simpler example).

Basics

The script that submits the job is called the job, and the individual array copy is call the task. Let's start with a basic job array sbatch script, basic.sh:

#!/bin/bash

#SBATCH --job-name myJobArray           # This job is named myJobArray.

#SBATCH --partition medium              # This job will go in the medium partition.

#SBATCH --nodes 1                       # This job will be run on one node and will be pending (PD) until available.

#SBATCH --ntasks-per-node 1             # This job will allocate one SLURM task on each allocated node.

#SBATCH --time 01:00                    # This job will run for one minute.

#

# -------------------------------------------------------------------------------------

#SBATCH --output myJobArray_%A_%a.txt   # The normal output will be put into this file.

#SBATCH --error myJobArray_%A_%a.err    # The error output will be put into this file.

#SBATCH --array=0-9


echo "Job ID:  $SLURM_ARRAY_JOB_ID"

echo "Task ID: $SLURM_ARRAY_TASK_ID"

Everything below the line is an example of how to use a job array, and we will be focusing on those. Let's break these down.


After running this script, this is the working directory:

ls

basic.sh                  myJobArray_5341902_2.err  myJobArray_5341902_4.txt  myJobArray_5341902_7.err  myJobArray_5341902_9.txt

myJobArray_5341902_0.err  myJobArray_5341902_2.txt  myJobArray_5341902_5.err  myJobArray_5341902_7.txt

myJobArray_5341902_0.txt  myJobArray_5341902_3.err  myJobArray_5341902_5.txt  myJobArray_5341902_8.err

myJobArray_5341902_1.err  myJobArray_5341902_3.txt  myJobArray_5341902_6.err  myJobArray_5341902_8.txt

myJobArray_5341902_1.txt  myJobArray_5341902_4.err  myJobArray_5341902_6.txt  myJobArray_5341902_9.err


basic.sh had a job ID of 5341902, and there are two files for every array task, an output file and an error file. Let's review the contents of each task's output.

cat myJobArray_5341902_0.txt

Job ID:  5341902

Task ID: 0


cat myJobArray_5341902_1.txt

Job ID:  5341902

Task ID: 1


cat myJobArray_5341902_2.txt

Job ID:  5341902

Task ID: 2


...


cat myJobArray_5341902_9.txt

Job ID:  5341902

Task ID: 9

Each file output the the job ID (which matches the %A value in the file name), and the array task ID (which matches the %a value in the file name). 


Squeue and Job Arrays

Let's add "sleep 60" to our basic.sh, telling each array task to sleep for 60 seconds. Then let's resubmit it and review the SLURM queue.

sbatch basic.sh

squeue -u sw23

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

     5341912_[8-9]    medium myJobArr     sw23 PD       0:00      2 (Resources)

         5341912_0    medium myJobArr     sw23  R       0:07      2 compute[016-017]

         5341912_1    medium myJobArr     sw23  R       0:07      2 compute[018-019]

         5341912_2    medium myJobArr     sw23  R       0:07      2 compute[020-021]

         5341912_3    medium myJobArr     sw23  R       0:07      2 compute[022-023]

         5341912_4    medium myJobArr     sw23  R       0:07      2 compute[024-025]

         5341912_5    medium myJobArr     sw23  R       0:07      2 compute[026-027]

         5341912_6    medium myJobArr     sw23  R       0:07      2 compute[028-029]

         5341912_7    medium myJobArr     sw23  R       0:07      2 compute[030-031]

This shows that the job ID is 5341912 and array tasks 0-7 are running, while array tasks 8-9 are pending, waiting for resources. This also shows that each task does run on two nodes.

To cancel all the array tasks, sw23 would use scancel 5341912; to cancel a specific array task, like task ID 0, sw23 would use scancel 5341912_0.


A More Efficient Example

Since each array task knows its own ID, then you can use this to have each array task check what it's ID is and then run specific parameters accordingly. An example of this logic is below. 

#!/bin/bash

#SBATCH --job-name efficientExample     # This job is named efficientExample.

#SBATCH --partition medium              # This job will go in the medium partition.

#SBATCH --ntasks 1                      # This job will allocate one task to itself.

#SBATCH --output efficient_%A_%a.txt    # The normal output will be put into this file.

#SBATCH --error efficient_%A_%a.err     # The error output will be put into this file.

#SBATCH --array=0-9



PARAMS=(1.0 1.1 1.2 1.3 1.4 2.0 2.1 2.2 2.3 2.4)


./a.out ${PARAMS[$SLURM_ARRAY_TASK_ID]}

This script will allocate ten SLURM tasks and initialize $PARAMS to an array with ten data points. Each array task will be allocated a single SLURM task, and then each will fill their single SLURM task by starting the process "a.out" with data input from the matching index of $PARAMS. The output of each array task will be "efficient_[job ID]_[array task ID].txt".

If there is ten cores for the ten SLURM tasks (because a SLURM task defaults to one core), then each calculation will be done in parallel. If there is not enough cores for each SLURM task, then there will be a waitlist for the remaining array tasks, and each will be started as the hardware becomes available.


A More Complex and Inefficient Example

This version could be helpful if a main control file is needed, but otherwise, reference the previous example, it is more efficient and usable.

#!/bin/bash

#SBATCH --job-name inefficientExample   # This job is named inefficientExample.

#SBATCH --partition medium              # This job will go in the medium partition.

#SBATCH --ntasks 5                      # This job will allocate five tasks to itself.

#SBATCH --nodes 1                       # This job will be run on one node and will be pending (PD) until available.

#SBATCH --ntasks-per-node 1             # This job will allocate one SLURM task on each allocated node.

#SBATCH --output inefficient_%A_%a.txt  # The normal output will be put into this file.

#SBATCH --error inefficient_%A_%a.err   # The error output will be put into this file.

#SBATCH --array=1-2



if [ $SLURM_ARRAY_TASK_ID == 1 ]

   then

   PARAMS = (1.0 1.1 1.2 1.3 1.4)

elif [ $SLURM_ARRAY_TASK_ID == 2 ]

   then

   PARAMS = (2.0 2.1 2.2 2.3 2.4)

fi


for param in "${PARAMS[@]}"

do

   ./a.out $param &

done

This shows where if the array task ID is 1, it uses the first set of parameters, and if the array task ID is 2, it uses a second set of parameters. Each array task will use all five available SLURM tasks to run a calculation (a.out) on each of the five elements in $PARAMS. Since this script specifies five SLURM tasks on a single node, these two array tasks act in parallel, as long as there are two nodes with five SLURM tasks available to allocate. If these criteria is not available simultaneously, then the second array task will wait until there is. Alternatively, you could have each array task load a different data file to run calculations on.


Miscellaneous Other Factoids about Job Arrays

More on Array Tasks

There are other operations available to tasks, such as specifying the array task ID more granularly.

#SBATCH --array 1,2,6,8,13,52

#SBATCH --array 1-7:2

#SBATCH --array 1-100%5


Job Array's Available Variables

Here is a list of the available variables created by using a job array. Each of these variables is available to each task.


How would I specify which nodes to use?

Use the below to specify the nodes all the array tasks can use. This will limit the available nodes to use to compute001, compute002, and compute003. If there is insufficient hardware, then a waitlist of array tasks will be used like normal.

#SBATCH --nodelist compute[001-003]