Servers

Types of Nodes & Queues to run your job

Node types are defined by the name of the node i.e. character part of the string "<node-type><node-id>". Type the command:

sinfo -Ne -o '%n %C %t'

output:

...

compt313 38/2/0/40 mix

compt314 26/14/0/40 mix

gput061 24/0/0/24 alloc

gput062 24/0/0/24 alloc

smpt08 0/40/0/40 idle

smpt09 0/40/0/40 idle

...

Affiliated Queues/Partitions

If you want to request compute nodes in other queues not a batch, see HPC resource View. You can find the queues using the command:

 sinfo

output:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST

batch*       up 13-08:00:0      1   comp compt309

batch*       up 13-08:00:0      2  drain compt[315-316]

batch*       up 13-08:00:0    102    mix compt[159-160,164,166,168-177,179-180,189-195,199-213,217-219,223-224,228-231,236-253,256,258,260-263,265,270-271,273-280,283-290,292-293,298-299,302,306-307,310-314]

batch*       up 13-08:00:0     28  alloc compt[161-162,165,178,181-188,254-255,257,259,272,291,294-297,300-301,303-305,308]

batch*       up 13-08:00:0     36   idle compt[146-158,196-198,214-216,220-222,225-227,232-235,264,266-269,281-282]

gpu          up 13-08:00:0      1  drain gput027

gpu          up 13-08:00:0     14    mix gput[026,031,033-037,040,047-048,054,056,058-059]

gpu          up 13-08:00:0     13  alloc gput[032,041-046,049,052,055,057,061-062]

...

In the listing above:

Explore Attributes of the Nodes

Different node types have their own attributes such as memory, CPUs etc and the node of the same type may  have dissimilar attributes. Now, we know what are the nodes available, let's know about their attributes by typing:

scontrol show node compt312

output:

NodeName=compt312 Arch=x86_64 CoresPerSocket=20

   CPUAlloc=32 CPUTot=40 CPULoad=15.10

   AvailableFeatures=icosa192gb

   ActiveFeatures=icosa192gb

   Gres=(null)

   NodeAddr=compt312 NodeHostName=compt312 Version=19.05.4

   OS=Linux 3.10.0-1127.el7.x86_64 #1 SMP Tue Feb 18 16:39:12 EST 2020

   RealMemory=191000 AllocMem=189440 FreeMem=21112 Sockets=2 Boards=1

   State=MIXED ThreadsPerCore=1 TmpDisk=100000 Weight=1 Owner=N/A MCS_label=N/A

   Partitions=batch

   BootTime=2021-01-12T18:03:50 SlurmdStartTime=2021-01-13T14:05:35

   CfgTRES=cpu=40,mem=191000M,billing=40

   AllocTRES=cpu=32,mem=185G

   CapWatts=n/a

   CurrentWatts=0 AveWatts=0

   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Here, the number of processors (ncpus) is 40, and available Memory (availmem) is 191000 (~ 190gb). The Features=icosa192gb indicates that this node has 2 x 20 processors.

To get the Feature information, use

scontrol show node | grep ActiveFeatures | sort | uniq

output

   ActiveFeatures=dodeca96gb

   ActiveFeatures=gpu2080

   ActiveFeatures=gpu2v100

   ActiveFeatures=gpu4v100

   ActiveFeatures=gpuk40

   ActiveFeatures=gpup100

   ActiveFeatures=icosa192gb

   ActiveFeatures=octa64gb

CPU Information

Here we share two sources useful to learn the specifics of the CPUs in a node:

-- The file /proc/cpuinfo  (use 'cat' or 'less' to view the contents)

-- The shell command lscpu -- display information about the cpu architecture

Both are specific to the node, which means that an active session on the node is required. Examples are provided below for comp188t, one of the newer nodes in the batch queue.

$ cat /proc/cpuinfo | grep -i 'model name' | uniq

model name      : Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

$ lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                40

On-line CPU(s) list:   0-39

Thread(s) per core:    1

Core(s) per socket:    20

Socket(s):             2

NUMA node(s):          2

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 85

Model name:            Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

Stepping:              7

CPU MHz:               2100.000

BogoMIPS:              4200.00

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              1024K

L3 cache:              28160K

NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38

NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities