In a High Performance Computing cluster, such as the NYU-IT HPC Greene cluster, there are hundreds of computing nodes interconnected by high-speed networks. Linux operating system runs on each of the nodes individually. The resources are shared among many users for their technical or scientific computing purposes. Slurm is a cluster software layer built on top of the interconnected nodes, aiming at orchestrating the nodes' computing activities, so that the cluster could be viewed as a unified, enhanced and scalable computing system by its users. In NYU HPC clusters the users coming from many departments with various disciplines and subjects, with their own computing projects, impose on us very diverse requirements regarding hardware, software resources, and processing parallelism. Users submit jobs, which compete for computing resources. The Slurm software system is a resource manager and a job scheduler, which is designed to allocate resources and schedule jobs. Slurm is an open-source software, with a large user community, and has been installed on many top 500 supercomputers.
This tutorial assumes you have a NYU HPC account. If not, you may find the steps to apply for an account here.
It also assumes you are comfortable with Linux command-line environment. To learn about Linux please read Tutorial 1.
Please read this page for Hardware Specs of Greene.
For an overview of useful Slurm commands, please read Slurm Main Commands page before continuing the tutorial.
Lmod, an Environment Module system, is a tool for managing multiple versions and configurations of software packages and is used by many HPC centers around the world. With Environment Modules, software packages are installed away from the base system directories, and for each package, an associated modulefile describes what must be altered in a user's shell environment - such as the $PATH environment variable - in order to use the software package. The modulefile also describes dependencies and conflicts between this software package and other packages and versions.
To use a given software package, you load the corresponding module. Unloading the module afterwards cleanly undoes the changes that loading the module made to your environment, thus freeing you to use other software packages that might have conflicted with the first one.
Below is a list of modules and their associated functions:
module unload <module-name> : unload a module
module show <module-name> : see exactly what effect loading the module will have with
module purge : remove all loaded modules from your environment
module load <module-name> : load a module
module whatis <module-name> : Find out more about a software package
module list : check which modules are currently loaded in your environment
module avail : check what software packages are available
module help <module-name> : A module file may include more detailed help for the software package
Batch jobs require a script file for the SLURM scheduler to interpret and execute. The SBATCH file contains both commands specific for SLURM to interpret as well as programs for it execute. Below is a simple example of a batch job to run a Stata do file, the file is named myscript.sbatch:
#!/bin/bashBelow we will break down each line of the SBATCH script. More options can be found on the SchedMD website.
## This tells the shell how to execute the scriptYou can submit the job with the following command:
$ sbatch myscript.sbatchThe command will result in the job queuing as it awaits resources to become available (which varies on the number of other jobs being urn on the cluster). You can see the status of your jobs with the following command:
$ squeue -u $USERLastly, you can read the output of your job in the slurm-<job_ID>.out file produced by running your job. This is where logs regarding the execution of your job can be found, including errors or system messages. You can print the contents to the screen from the directory containing the output file with the following command:
$ cat slurm-<job_ID>.outWhile the majority of the jobs on the cluster are submitted with the 'sbatch' command, and executed in the background, there are also methods to run applications interactively through the 'srun' command. Interactive jobs allow the users to enter commands and data on the command line (or in a graphical interface), providing an experience similar to working on a desktop or laptop. Examples of common interactive tasks are:
Editing files
Compiling and debugging code
Exploring data, to obtain a rough idea of characteristics on the topic
Getting graphical windows to run visualization
Running software tools in interactive sessions
Interactive jobs also help avoid issues with the login nodes. If you are working on a login node and your job is too IO intensive, it may be removed without notice. Running interactive jobs on compute nodes does not impact many users and in addition provides access to resources that are not available on the login nodes, such as interactive access to GPUs, high memory, exclusive access to all the resources of a compute node, etc.
In the srun examples below, through "--pty /bin/bash" we request to start bash command shell session in pseudo terminal. By default the resource allocated is single CPU core and 2GB memory for 1 hour.
$ srun --pty /bin/bashTo request 4 CPU cores, 4 GB memory, and 2 hour running duration, you can add the following arguments:
$ srun --cpus-per-task=4 --time=2:00:00 --mem=4000 --pty /bin/bashSimilarly, to request one GPU card, 3 GB memory, and 1.5 hour running duration you can add the following:
$ srun --time=1:30:00 --mem=3000 --gres=gpu:1 --pty /bin/bashOnce the job begins you will notice your prompt change, for example:
[mdw303@log-3 ~]$ srun --pty /bin/bashYou can see above that the prompt changed from log-3 to cs080, meaning the prompt was no longer on the login node but rather the compute node. You can then load modules and software and run them interactively without impacting the cluster. Below outlines the steps to start an interactive session and launch R:
[sk6404@log-1 ~]$ srun --cpus-per-task=1 --pty /bin/bashMPI stands for "Message Passing Interface" and is managed by a program, such as OpenMPI, to coordinate code and resources across the HPC cluster for your job to run workloads in parallel. You may have heard of HPC sometimes referred to as "parallel computing" because the ability to run many processes simultaneously - aka in parallel - is how the greatest efficiencies can be realized on the cluster. Users interested in MPI generally must compile the program they want to run using an MPI compiler. Greene supports two common OpenMPI versions, Intel and GCC. These can be loaded with either of the following:
Intel
$ module load openmpi/intel/4.1.1GCC
$ module load openmpi/gcc/4.1.1Below we will illustrate an example of how to compile a C script for MPI. Copy this into your working directory as hellompi.c:
#include <stdio.h>Once copied into your directory, load OpenMPI and compile it with the following:
$ module load openmpi/intel/4.1.1Next, create a hellompi.sbatch script:
#!/bin/bashRun the job with the following command:
$ sbatch hellompi.sbatchAfter the job runs, cat the hellompi.out to see that your processes ran on multiple nodes. There may be some errors, but your output should contain something like the following, indicating the process was run in parallel on multiple nodes:
Process 0 on cs265.nyu.cluster out of 4To request one GPU card, use SBATCH directives in job script:
#SBATCH --gres=gpu:1To request a specific card type, use e. g. --gres=gpu:v100:1. The card types currently available are v100 and RTX 8000. As an example, let's submit an Amber job. Amber is a molecular dynamics software package. The recipe is:
$ mkdir -p /scratch/$USER/myambertestFrom the tutorial example directory we copy over Amber input data files "inpcrd", "prmtop" and "mdin", and the job script file "run-amber.s".
NOTE: At the time of writing this you may need to update the run-amber.s script to load amber version 20.06, rather than the default 16.06.
The content of the job script "run-amber.s" should be as follows:
#!/bin/bashThe demo Amber job should take ~2 minutes to finish once it starts running. When the job is done, several output files are generated. Check the one named "mdout", which has a section most relevant here:
Using job array you may submit many similar jobs with almost identical job requirement. This reduces loads on both users and the scheduler system. Job arrays can only be used in batch jobs. Usually the only requirement difference among jobs in a job array is the input file or files. Please follow the recipe below to try the example. There are 5 input files named 'sample-1.txt', 'sample-2.txt' to 'sample-5.txt' in sequential order. Running one command "sbatch run-jobarray.s", you submit 5 jobs to process each of these input files individually. Run the following commands to create the directory and submit the array job:
$ mkdir -p /scratch/$USER/myjarraytestThe content of the job script 'run-jobarray.s' is copied below:
#!/bin/bashJob array submissions create an environment variable called SLURM_ARRAY_TASK_ID, which is unique for each job array job. It is usually embedded somewhere so that at a job running time its unique value is incorporated into producing a proper file name. Also as shown above: two additional options %A and %a, denoting the job ID and the task ID (i.e. job array index) respectively, are available for specifying a job's stdout, and stderr file names.