Slurm

2) #not_to_do: In the past, we used the variable $SLURM_SUBMIT_DIR in the script to run VASP calculations. This will copy files from the user's current working directory from where they run sbatch, and pastes the files in storage/scratch directory. It helps to bypass the scratch files to scratch directory. Currently, this command copies all files from several home directories to our storage directories. So, do not use this variable in the script.

Instead modify the script to not use $SLURM_SUBMIT_DIR. One can do one of the following things:

1. Use a hard-coded path in the script

2. Dynamically determine the location of the script path and use that as a reference point. See example code below.

# Determine and move to the path of the script regardless of where command was run from.

pushd `dirname $0` > /dev/null

SCRIPTPATH=`pwd`

popd > /dev/null

cd "$SCRIPTPATH"

Introduction

The main guide for SLURM can be found online. However, these pages describes how to submit a job to the High Performance Computing Cluster:

https://rcc.fsu.edu/docs/moab-slurm-migration-guide

https://rcc.fsu.edu/docs/how-submit-hpc-jobs

Memory Allocation Tips for the HPC

Most nodes in the HPC cluster contain 16 CPU cores and 64GB of RAM. By default, the HPC will allocate 3.9GB of RAM per process. If your job needs more than this, you can override the default, but it is best to do so in increments of 3.9GB.

Memory Planning on the HPC

Most nodes in the HPC Cluster contain 16 cores and 64GB of RAM. However, not all of these resources are available for running jobs. Some resources are reserved for overhead. This can affect how your jobs are scheduled if you are not aware.

This documentation page covers several scenarios for planning memory usage, and how to use the --mem-per-cpu and --mem options in your job submission scripts.

Default - 3.9GB per requested core

If you do not specify the amount of memory for your job, Slurm will allocate the default, which is 3.9GB of RAM per core. So, if you request 16 cores, your memory allocation will be 3.9 x 16 = 62.4GB.

Why not an even 4GB of RAM per core?

Weird scheduling side effects occur if 64GB of RAM are requested. Since a small amount of RAM is reserved as overhead on each node, a job that requests 64GB of RAM will spill over onto two nodes, rather than just running on a single node. This can delay job start times.

Furthermore, if you have used the -N 1 parameter to specify that your job should run on a single node, and you request 64GB of RAM, your job will never start, because that combination of resources is not available.

High memory jobs (more than the default)

HPC Big Memory Nodes

NODES Memory

hpc-8-[12 to 19] 64GB (But ~62 GB Available)

hpc-8-[20 to 33] 128GB (But ~122 GB Available)

hpc-8-[34 to 39] 256GB (But ~246 GB Available)

The default memory per core is 3.9GB. so if you take 120/3.9=30.7 that means that slurm thinks you need a node with 31 cores to run this job, so you need to add --mem-per-cpu=7500

#BATCH -J "Cf-DOPO_ZORA"

#SBATCH -n 16

#SBATCH -N 1

#SBATCH --mem-per-cpu=7500

#SBATCH --mem=120g #This is not really necessary.

#SBATCH -o CF-DOPO_ZORA-%J.o

#SBATCH -e CF-DOPO_ZORA-%J.e

#SBATCH -p mendoza_q

#SBATCH -t 120:00:00

cd $SLURM_SUBMIT_DIR

module purge

module load adf # Loading ADF

export NSCM=16 # Remember this number should equal to cpus when using CPU

echo $NSCM

which adf

adf -n 16 < Pu-DOPO_ZORA.inp > Pu-DOPO_ZORA.out

If you need to specify a higher memory allocation for your job than the default, a good rule of thumb is to use 3.9 as your multiplier. For example, if you want to use 8GB of RAM per core, use 7.8 instead. This will ensure that resources for your job are more efficiently allocated.

More info: https://rcc.fsu.edu/docs/memory-planning-hpc

Page updated

Google Sites

Report abuse