Servers
Types of Nodes & Queues to run your job
Node types are defined by the name of the node i.e. character part of the string "<node-type><node-id>". Type the command:
sinfo -Ne -o '%n %C %t'
output:
...
compt313 38/2/0/40 mix
compt314 26/14/0/40 mix
gput061 24/0/0/24 alloc
gput062 24/0/0/24 alloc
smpt08 0/40/0/40 idle
smpt09 0/40/0/40 idle
...
Affiliated Queues/Partitions
If you want to request compute nodes in other queues not a batch, see HPC resource View. You can find the queues using the command:
sinfo
output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
batch* up 13-08:00:0 1 comp compt309
batch* up 13-08:00:0 2 drain compt[315-316]
batch* up 13-08:00:0 102 mix compt[159-160,164,166,168-177,179-180,189-195,199-213,217-219,223-224,228-231,236-253,256,258,260-263,265,270-271,273-280,283-290,292-293,298-299,302,306-307,310-314]
batch* up 13-08:00:0 28 alloc compt[161-162,165,178,181-188,254-255,257,259,272,291,294-297,300-301,303-305,308]
batch* up 13-08:00:0 36 idle compt[146-158,196-198,214-216,220-222,225-227,232-235,264,266-269,281-282]
gpu up 13-08:00:0 1 drain gput027
gpu up 13-08:00:0 14 mix gput[026,031,033-037,040,047-048,054,056,058-059]
gpu up 13-08:00:0 13 alloc gput[032,041-046,049,052,055,057,061-062]
...
In the listing above:
Nodes in the state "drain" are not available: compt[315-316] or gput027
Nodes in the state "mix" are running jobs, but have available processors
Nodes in the state "alloc" are running jobs, and have no more available processors
Nodes in the state "idle" are available
Explore Attributes of the Nodes
Different node types have their own attributes such as memory, CPUs etc and the node of the same type may have dissimilar attributes. Now, we know what are the nodes available, let's know about their attributes by typing:
scontrol show node compt312
output:
NodeName=compt312 Arch=x86_64 CoresPerSocket=20
CPUAlloc=32 CPUTot=40 CPULoad=15.10
AvailableFeatures=icosa192gb
ActiveFeatures=icosa192gb
Gres=(null)
NodeAddr=compt312 NodeHostName=compt312 Version=19.05.4
OS=Linux 3.10.0-1127.el7.x86_64 #1 SMP Tue Feb 18 16:39:12 EST 2020
RealMemory=191000 AllocMem=189440 FreeMem=21112 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=100000 Weight=1 Owner=N/A MCS_label=N/A
Partitions=batch
BootTime=2021-01-12T18:03:50 SlurmdStartTime=2021-01-13T14:05:35
CfgTRES=cpu=40,mem=191000M,billing=40
AllocTRES=cpu=32,mem=185G
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Here, the number of processors (ncpus) is 40, and available Memory (availmem) is 191000 (~ 190gb). The Features=icosa192gb indicates that this node has 2 x 20 processors.
To get the Feature information, use
scontrol show node | grep ActiveFeatures | sort | uniq
output
ActiveFeatures=dodeca96gb
ActiveFeatures=gpu2080
ActiveFeatures=gpu2v100
ActiveFeatures=gpu4v100
ActiveFeatures=gpuk40
ActiveFeatures=gpup100
ActiveFeatures=icosa192gb
ActiveFeatures=octa64gb
CPU Information
Here we share two sources useful to learn the specifics of the CPUs in a node:
-- The file /proc/cpuinfo (use 'cat' or 'less' to view the contents)
-- The shell command lscpu -- display information about the cpu architecture
Both are specific to the node, which means that an active session on the node is required. Examples are provided below for comp188t, one of the newer nodes in the batch queue.
$ cat /proc/cpuinfo | grep -i 'model name' | uniq
model name : Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 1
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
Stepping: 7
CPU MHz: 2100.000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities