The HPC Cluster uses Slurm as its scheduler.
The Slurm website can be found here: http://slurm.schedmd.com/
Tutorials: http://slurm.schedmd.com/tutorials.html
Documentation: http://slurm.schedmd.com/documentation.html
Comparison between PBS/TORQUE & SLURM: http://slurm.schedmd.com/rosetta.html
The outline of the command differences between Slurm and Torque are available at http://slurm.schedmd.com/rosetta.html & http://slurm.schedmd.com/documentation.html
For complicated job submission, we recommend you to understand the SLURM syntax.
SLURM command "srun" does not seem to work properly when used within MPI context with the following error. So, don't use srun for parallel jobs.
srun -n 4 hello_mpi
srun: error: Unable to create job step: More processors requested than permitted
Some more details (adapted from ACCRE documentation) [1]:
Slurm is a software package for submitting, scheduling, and monitoring jobs on large compute clusters. This page details how to use Slurm for submitting and monitoring jobs on our cluster. New cluster users should consult our Getting Started page, which is designed to walk you through the process of creating a job script, submitting a job to the cluster, monitoring jobs, checking job usage statistics, and understanding our cluster policies.
The following table summarizes the Slurm commands
Each of these environment variables can be referenced from a SLURM batch script using the $ symbol before the name of the variable (e.g. echo $SLURM_JOBID) A full list of SLURM environment variables can be found here: http://slurm.schedmd.com/sbatch.html#lbAF
Slurm tool, pestat [3], prints a Slurm cluster nodes status with 1 line per node and job info.
Usage:
pestat --help
output:
/usr/local/bin/pestat: illegal option -- -
Usage: pestat [-p partition(s)] [-u username] [-g groupname] [-a accountname]
[-q qoslist] [-s statelist] [-n/-w hostlist] [-j joblist] [-G] [-N]
[-f | -F | -m free_mem | -M free_mem ] [-1|-2] [-d] [-E] [-C|-c] [-V] [-h]
where:
-p partition: Select only partion <partition>
-u username: Print only user <username>
-g groupname: Print only users in UNIX group <groupname>
-a accountname: Print only jobs in Slurm account <accountname>
...
Example: Check the job info in gpu partition
pestat -p gpupestat -G -p gpu
output:
GRES (Generic Resource) is printed after each jobid
Print only nodes in partition gpu
Hostname Partition Node Num_CPU CPUload Memsize Freemem GRES/ Joblist
State Use/Tot (MB) (MB) node JobId User GRES/job ...
gput017 gpu idle 0 12 0.15 48129 16617 gpu:2
gput019 gpu mix 1 12 1.20 48129 14442 gpu:2 13339263 <user> gpu:1
Contact us at