HPC Resource View

The resources are summarized at the link below with latest usage statistics on a 20 minute interval.

Important Notes:

- Please take the floor memory value when you do the memory conversion from MB to GB. The CPUs and Memory in the table are for each node in that node range.
- These resource views indicate node ranges having specific resources. Use the 'sinfo -O nodehost,features | grep <feature>' command in a shell to determine which specific nodes are available on that cluster with that feature. Use 'si' to list all nodes in cluster with features listed, and their current status.
- GPU2080 GPU nodes gput[045-052] and GPU2V100 nodes gput(057-062), and GPU4V100 nodes gput(053-056) do have SSD drive. Please use /tmp ($TMPDIR) as scratch space to use SSDs.
- RDS is mounted only in a few selected nodes compt[317-326] and a couple of GPU nodes. Use "-C rds" to access the mounted files. To make use of all compute nodes, please follow (you can include them in the script):

# Create temporary scratch space

mkdir /scratch/users/<CaseID>

# Copy data from RDS to /scratch

ssh dtn2t "cp -r /mnt/rds/<rds name>/<folder1> /scratch/users/<CaseID>"

# Copy data from /scratch back to RDS

ssh dtn2t "cp -r /scratch/users/<CaseID>/<folder1> /mnt/rds/<rds name>/."

To get the Feature or constraint information, use

scontrol show node | egrep "(NodeName|AvailableFeatures)"

output

...

NodeName=gput071 Arch=x86_64 CoresPerSocket=24

AvailableFeatures=gpul40s

NodeName=gput072 Arch=x86_64 CoresPerSocket=12

AvailableFeatures=gpu2h100

NodeName=gput073 Arch=x86_64 CoresPerSocket=12

AvailableFeatures=gpu2h100

NodeName=gput074 Arch=x86_64 CoresPerSocket=32

AvailableFeatures=gpu4090

It indicates that there are a few different Featured Nodes: gpul40s, gpu2h100, gpu4090.

If you want to request a gpu node with L40S GPU, with 10 processors, 80gb of memory:

srun -c 20 -C gpul40s --mem=92gb --gres=gpu:1 --pty bash

You need to specify the queue type (e.g. gpu, smp) to use the resources available in nodes in those queue. For example, to request 32 processors in a node, you need to use the smp nodes in smp queue (-p smp).

srun -p smp -c 32 --mem=64gb --pty bash

Request a node exclusively (in default batch queue); --mem=0 needs to be included to request all the memory available in the node.

srun --exclusive --mem=0 --pty bash

It will reserve all the processors. If you need more memory or GPU resources, you need to request that explicitly.

Memory per CPU:

Memory per cpu are useful especially with MPI jobs. Let's request 4gb/cpu for MPI job using 10 processors.

srun -n 10 --mem-per-cpu=4gb --pty bash

Now, check the memory:

[abc123@smp05t ~]$ ulimit -a

output:

...

max memory size (kbytes, -m) 67108864

...

For more information, please visit HPC Guide to Interactive and Batch Job Submission that contains the section on memory intensive job.

For detail info on HPC Servers and Storage, Please visit Servers & Storage.

Request GPU nodes to use SSD space as a /tmp space:

srun -p gpu -C 'gpu2080|gpu2v100|gpu4v100' --gres=gpu:2

Request specific node or the list of nodes

Get the nodes in the node list (e.g. compt362-compt366)

srun --nodelist=compt[362-366] --pty bash

This will assign you one processor in each node in the list as showed.

<jobID> batch bash <User> R 0:09 9:59:51 5 5 1922 compt[362-366]

You can use -n option to select the number of processors in each node. For requesting nodes not in a range, use coma separated list (e.g --nodelist=compt362,compt365). You can also do it via input file (e.g. --nodelist ./node-file)

Exclude specific node or the list of nodes

Exclude few gpu nodes that have older version of driver

srun -p gpu -C "gpup100|gpul40s" --gres=gpu:1 --exclude=gput[070-071] --pty bash

Page updated

Report abuse