Examples

- Distributed memory programs that include explicit support for message passing between processes (e.g. MPI). These processes execute across multiple CPU cores and/or nodes.
- Multithreaded programs that include explicit support for shared memory processing via multiple threads of execution (e.g. Posix Threads or OpenMP) running across multiple CPU cores.
- Embarrassingly parallel analysis in which multiple instances of the same program execute on multiple data files simultaneously, with each instance running independently from others on its own allocated resources (i.e. CPUs and memory). SLURM job arrays offer a simple mechanism for achieving this.
- GPU (graphics processing unit) programs including explicit support for offloading to the device via languages like CUDA or OpenCL.

It is important to understand the capabilities and limitations of an application in order to fully leverage the parallel processing options available on the ACCRE cluster. For instance, many popular scientific computing languages like Python, R, and Matlab now offer packages that allow for GPU or multithreaded processing, especially for matrix and vector operations.

MPI Jobs

Jobs running MPI (Message Passing Interface) code require special attention within SLURM. SLURM allocates and launches MPI jobs differently depending on the version of MPI used (e.g. OpenMPI, MPICH2, Intel MPI). We recommend using OpenMPI either version 1.8.3 (default) or 1.8.5 to compile code and then using SLURM’s mpirun command to launch parallel MPI jobs. The example below runs MPI code compiled by OpenMPI 1.8.3 (default) or 1.8.5:

#!/bin/bash

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH -N 3

#SBATCH -n 24 # 8 MPI processes per node

#SBATCH --time=7-00:00:00

#SBATCH --mem=4G # 4 GB RAM per node

#SBATCH --output=mpi_job_slurm.log

module load openmpi # this is version 1.8.3

echo $SLURM_JOB_NODELIST

# Assign the number of processors

NPROCS=$SLURM_NTASKS

#Run the job

mpirun -n $NPROCS ./test

This example requests 3 nodes and 8 tasks (i.e. processes) per node, for a total of 24 MPI tasks. It is recommended for you to just use "-n" option ONLY for automatic load balancing. By default, SLURM allocates 1 CPU core per process, so this job will run across 24 CPU cores. Note that mpirun/mpiexec accepts -n <number cpus>). Please avoid using srun command to run parallel jobs at this time, since it does not seem to work well.

Executables generated with older versions of OpenMPI or MPICH2 should be launched using these packages’ native mpirun or mpiexec commands rather than SLURM’s srun. Such programs may run under SLURM but in some cases they may not.

More information about running MPI jobs within SLURM can be found here here: http://slurm.schedmd.com/mpi_guide.html.

Multithreaded Jobs (OpenMP)

Multithreaded programs are applications that are able to execute in parallel across multiple CPU cores within a single node using a shared memory execution model. In general, a multithreaded application uses a single process (i.e. “task” in SLURM) which then spawns multiple threads of execution. By default, SLURM allocates 1 CPU core per task. In order to make use of multiple CPU cores in a multithreaded program, one must include the –cpus-per-task option. Below is an example of a multithreaded program requesting 4 CPU cores per task. The program itself is responsible for spawning the appropriate number of threads.

#!/bin/bash #SBATCH -N 1

#SBATCH --cpus-per-task=4 # 4 threads per task

#SBATCH --time=02:00:00 # two hours

#SBATCH --mem=4G

#SBATCH --output=multithread.out

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH --job-name=multithreaded_example

#Export Number of Threads

NPROCS=$(( $SLURM_NNODES * $SLURM_CPUS_PER_TASK ))

export OMP_NUM_THREADS=$NPROCS

# Run multi-threaded application ./hello

Job Arrays

Job arrays are useful for submitting and managing a large number of similar jobs. As an example, job arrays are convenient if a user wishes to run the same analysis on 100 different files. SLURM provides job array environment variables that allow multiple versions of input files to be easily referenced. In the example below, three input files called vectorization_0.py, vectorization_1.py, and vectorization_2.py are used as input for three independent Python jobs:

#!/bin/bash

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH -c 1

#SBATCH --time=2:00:00

#SBATCH --mem=2G

#SBATCH --array=0-2

#SBATCH --output=python_array_job_slurm_%A_%a.out

echo "SLURM_JOBID: " $SLURM_JOBID

echo "SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

echo "SLURM_ARRAY_JOB_ID: " $SLURM_ARRAY_JOB_ID

module load python

python < vectorization_${SLURM_ARRAY_TASK_ID}.py

The #SBATCH –array=0-2 line specifies the array size (3) and array indices (0, 1, and 2). These indices are referenced through the SLURM_ARRAY_TASK_ID environment variable in the final line of the SLURM batch script to independently analyze the three input files. Each Python instance will receive its own resource allocation; in this case, each instance is allocated 1 CPU core (and 1 node), 2 hours of wall time, and 5 GB of RAM.

One implication of allocating resources per task is that the node count will not apply across all tasks, so specifying –nodes=1 will not limit all tasks within an array to a single node. To limit the total number of CPU cores (and thus tasks) used simultaneously, use %[CPU_COUNT] following the –array= option. For example, –array=0-100%4 will limit the tasks to running on 4 CPU cores simultaneously. This means the tasks will execute in batches of 4 until all 100 tasks have completed.

The –array= option is flexible in terms of the index range and stride length. For instance, –array=0-10:2 would give indices of 0, 2, 4, 6, 8, and 10.

The %A and %a variables provide a method for directing standard output to separate files. %A references the SLURM_ARRAY_JOB_ID while %a references SLURM_ARRAY_TASK_ID. SLURM treats job ID information for job arrays in the following way: each task within the array has the same SLURM_ARRAY_JOB_ID, and its own unique SLURM_JOBID and SLURM_ARRAY_TASK_ID. The JOBID shown from squeue is formatted by SLURM_ARRAY_JOB_ID followed by an underscore and the SLURM_ARRAY_TASK_ID.

While the previous example provides a relatively simple method for running analyses in parallel, it can at times be inconvenient to rename files so that they may be easy indexed from within a job array. The following example provides a method for analyzing files with arbitrary file names, provided they are all stored in a sub-directory named data:

#!/bin/bash

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH -c 1

#SBATCH --time=2:00:00

#SBATCH --mem=2G

#SBATCH --array=1-5

# In this example we have 5 files to analyze

#SBATCH --output=python_array_job_slurm_%A_%a.out

arrayfile=`ls data/ | awk -v line=$SLURM_ARRAY_TASK_ID '{if (NR == line) print $0}'`

module load python

python < data/$arrayfile

More information can be found here: http://slurm.schedmd.com/job_array.html

GPU Jobs

Applications that run CUDA (NVIDIA’s C-like programming interface for offloading computationally intensive tasks to the GPU) code can be executed on the GPU nodes.

Several versions of the CUDA toolkit are installed on the cluster, which can be loaded into your environment with the module command.

module load cuda

Below is an example script that loads and runs LAMMPS, a molecular dynamics package that makes extensive use of NVIDIA GPU computing. Note that you must be in one of the GPU groups on the cluster and specify this group from the job script in order to run jobs on the GPU cluster. The #SBATCH –partition=gpu line is also required in the job script. Currently we have SLURM configured to provide exclusive access to GPU node resources, meaning that only a single job can run on a GPU node at a time. The job can access all resources on the node(s) it was allocated by SLURM, including the four GPU cards per node. In this case, the LAMMPS module already include the cuda module.

#!/bin/bash

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH -N 1

#SBATCH -c 1

#SBATCH --time=2:00:00 # 2 hours

#SBATCH --mem=100M

#SBATCH --output=gpu-job.log

#SBATCH --partition=gpu

#SBATCH -C gpu2v100

#SBATCH --gres=gpu:1 # Request one gpu out of 2 (Max)

#SBATCH --account=<group acct> # substitute appropriate group here

module load lammps

pwd

date

lmp2015_cuda input.file

date

Page updated

Report abuse