Best Practices on Greene

Scheduling jobs in an efficient and fair manner can be a challenging task in a multi-user environment. Here we provide some recommendations.

What resources to ask for (and how much)?

When asking for compute resources in the batch script, never ask for more than you need.

There are two important reasons listed below. Imagine you need only 2 CPUs but request 10.

The same argument applies to other types of resources: RAM, GPUs (and potentially other 'TRES' in SLURM terminology).

Very often, even for parallel codes, it makes sense to request fewer CPU cores. Main example of this is -- your parallel job doesn't scale well. For example, imagine that using 12 cores instead of 4, you can reduce execution time from 1 hour (on 4 cores) to 45 minutes (on 12 cores). Yes, 45 minutes is faster, but you are using 3 times as many resources. Additionally, it may happen that your job with 4 cores will only wait in the queue for 5 minutes, but the 12-core job would need to wait for 20 minutes. This completely  offsets all the gains from using 8 more cores. In the end you are to judge, what is really needed, but you should always make this type of considerations when creating a job script. 

We recommend you don't specify a specific GPU unless needed. If it isn't very important to run your job on a particular GPU device (for example a powerful V100), then request any GPU with

#SBATCH --gres=gpu:1

And only when you are absolutely sure that you need a V100, use #SBATCH --gres=gpu:v100:1. V100s are often in high demand and average wait time for these GPUs is higher than for any other type of GPU devices. It may be (and very often is) beneficial to run your job on a less powerful accelerator, but spend less time waiting in the queue for your job to start.   

Do NOT run CPU heavy jobs on login nodes 

Login nodes designed use

Important: check if you are wasting RAM and CPU!

Another useful command that allows you to better understand how resources were utilized by completed jobs is seff:

[~]$ seff 8932105Job ID: 8932105Cluster: greeneUser/Group: NetID/GROUPIDState: COMPLETED (exit code 0)Cores: 1CPU Utilized: 02:22:45CPU Efficiency: 99.99% of 02:22:46 core-walltimeJob Wall-clock time: 02:22:46Memory Utilized: 2.18 GBMemory Efficiency: 21.80% of 10.00 GB

This example shows statistics on a completed job, that was ran with a request of 1 cpu core and 10Gb of RAM. While CPU utilization was 100%, RAM utilization was very poor -- only 2.2GB out of requested 10GB was used. This job's batch script should definitely be adjusted to something like #SBATCH --mem=2250MB

Check resources usage of a currently running job

Run job. When it is scheduled you can use squeue to figure out node name, where job is scheduled.

From login node, do

ssh <node-name>top -u $USER

Take a look how fully you use CPUs and how much RAM your jobs are using.

For a GPU job also run

nvidia-smi

Take a look how much GPU processing power your job is using.

It may happen that your code does not scale well, and it is better to use 1 or 2 GPUs instead of 4

You can also take a look at GPU memory utilization.

RAM specification in sbatch file is for CPU RAM, not GPU memory!

Please request only as much RAM as your job needs!

Is my job scalable? How efficiently I use multiple CPUs/GPUs

Every code is different. Test it!

Why my jobs are queued?

To understand why your job is waiting in the queue you can run 

squeue  -j <JobID> -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %R"

Last column of the output would indicate a reason. You can find out more about squeue output format from man squeue

The column "NODELIST(REASON)" in the end is job status due to the reason(s), which can be :

 For a more complete list of possible values of REASON, please refer to man squeue under the section JOB REASON CODES.

Limits on resources you can request

Within SLURM there are multiple limits defined on different levels and applied to different objects. Some of the important limits are listed here. 

Number of jobs per user

2000

Job lifetime

Limited to 7 days (168 hours), but you can request an extension by emailing HPC team (hpc@nyu.edu).

CPU, GPU, RAM

These limits depend on the time you request for the job: "short queue" (under 48 hours, or 2 days) or "long queue" (under 168 hours, or 7 days).
These limits may be updated by the HPC team, based on the cluster usage patterns. In order to obtain the up to date numbers you have to run the following command

$ sacctmgr list qos format=maxwall,maxtresperuser%40,name

In the output look at MaxWall to determine queue you are interested in (short or long). Now look under MaxTRESPU to find limits for CPU, RAM, GPU.

Here is an example of output you may get

    MaxWall                                MaxTRESPU       Name----------- ---------------------------------------- ----------                                                         normal 2-00:00:00                       cpu=3000,mem=6000G      cpu48 7-00:00:00                       cpu=1000,mem=2000G     cpu168 2-00:00:00                              gres/gpu=24      gpu48 7-00:00:00                               gres/gpu=4     gpu168   04:00:00                        cpu=48,gres/gpu=4   interact                                         gres/gpu=96     gpuamd   12:00:00                     cpu=20000,mem=10000G     cpulow

How many CPU cores per GPU

These limits are frequently updated by the HPC team, based on the cluster usage patterns. 
Due to this, the numbers below are not exact and should only be used as general guidelines. 

Here are some of these limits:

| # gpus | max_cpus | max_memory |         gpu type = "V100"      |--------+----------+------------||      1 |       20 |        200 ||      2 |       24 |        300 ||      3 |       44 |        350 ||      4 |       48 |        369 |         gpu type = "rtx8000"            |--------+----------+------------||      1 |       20 |        200 ||      2 |       24 |        300 ||      3 |       44 |        350 ||      4 |       48 |        369 |         gpu type = "a100"            |--------+----------+------------||      1 |       28 |        250 ||      2 |       32 |        300 ||      3 |       60 |        400 ||      4 |       64 |        490 |         gpu type = "mi50"             |--------+----------+------------||      1 |       48 |        200 ||      2 |       72 |        300 ||      3 |       76 |        350 ||      4 |       80 |        370 |
|      5 |       84 |        400 |
|      6 |       88 |        430 ||      7 |       92 |        460 ||      8 |       96 |        490 ||--------+----------+------------|

From this table you can for example see, that a job asking for 8 V100 GPUs will not be queued. Another example is that requests for 2 V100s and 48 cores will also not be granted. 

How to find more information on my jobs?

Some other useful SLURM commands that can help to get information about running and pending jobs are

# detailed information for a job:scontrol show jobid -dd <jobid># show status of a currently running job# (see 'man sstat' for other available JOB STATUS FIELDS)sstat --format=TresUsageInMax%80,TresUsageInMaxNode%80 -j <JobID> --allsteps# get stats for completed jobs # (see 'man sacct' for other JOB ACCOUNTING FIELDS)sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed# the same information for all jobs of a user:sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed 

How busy is the cluster?

Please refer to the Systems Status page to see the number of jobs in the queue and other metrics. Often your jobs are queued for a simple reason -- cluster is very busy and there aren't enough resources available.

Best way to submit large number of similar jobs

The correct way to submit such jobs is to use the array job functionality of SLURM. This reduces load on the scheduler system.

Don't make your own loops to do this kind of work.

You can find a detailed description on how to submit such jobs here.

Web scraping, Data mining from websites

If you need to do web scraping, don't do that on Greene. It is not allowed due to security concerns and waste of CPU resources (most of the time a job would spend on downloading files, instead of using CPU). Please contact us, and we will advise on a better workflow for your project.

Error handling

We recommend using #!/bin/bash -e instead of plain #!/bin/bash, so that the failure of any command within the script will cause your job to stop immediately rather than attempting to continue on with an unexpected environment or erroneous intermediate data. It also ensures that your failed jobs show a status of FAILED in sacct output.

When a batch job is finished it produces an exit code (among other useful data). To view the error code of the job you can use: 

sacct -b -j <JobID>

When reaching out to the HPC team asking for help with failing jobs, it is useful to find an exit code from the job at question. 

Check efficiency of your jobs

Please check on this page

SSH Issues

Some people may experience connection warnings while connection to Greene, and connections being terminated too soon. 

This can be addressed by entering the following into ~/ssh/config 

# Increase alive interval  host *  ServerAliveInterval 60  ForwardAgent yes  StrictHostKeyChecking no  UserKnownHostsFile /dev/null  LogLevel ERROR

More information on SSH can be found on the SSH Tunneling page.