Configuring Server Policy for TORQUE and PBS Pro
Configuring Server Policy for TORQUE and PBS Pro
Workload manager allows administrator (root) or approved user (sudo) to set up server policy and queue management. Linux cluster whose job scheduling manager are submitted to run on, such as Torque or PBS, can create or modify or delete queue policy. For example, setting up the limit of wall-time and maximum number of CPU cores.
Let's do it !
To do this, qmgr command is all you need! A qmgr utility is a queue manager used to configure queue system. Sadly, there is no manual of man and info for this command, but, qmgr has a built-in manual. Open its help page by typing help in qmgr interactive program.
The following commands are to open qmgr's manual.
[rangsiman@nitrogen ~]$ qmgr
Max open servers: 4
Qmgr: hi
qmgr: Illegal operation: hi
Try 'help' if you are having trouble.
Qmgr:
Qmgr: help
General syntax: command [object][@server] [name attribute[.resource] OP value]
To get help on any topic or subtopic, type help <topic>
Help is available on all commands and topics.
Available commands: active, create, delete, set, unset, list, print, quit
Other topics are attributes, operators, names, and values .
Qmgr:
Qmgr: help create
Syntax: create object name[,name...]
Objects can be "queue" or "node"
The create command will create the specified object on the PBS server(s).
For multiple names, use a comma separated list with no intervening whitespace.
Examples:
create queue q1,q2,q3
Qmgr:
Qmgr:
If you make a typo or wrong command in qmgr, it will show you like 'you can type help to get a manual of available option'.
Create a queue
The following series of qmgr command will create and configure a queue named default step-by-step.
qmgr -c "create queue default queue_type=execution"
qmgr -c "set queue default started=true"
qmgr -c "set queue default enabled=true"
qmgr -c "set queue default resources_default.nodes=1"
qmgr -c "set queue default resources_default.walltime=3600"
This queue is ready to get a new job. As default setting, if no any environment requests, the queue will assign your job to run on 1 node with wall-time 3600 second automatically.
Summary of all queue.
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
short -- -- 24:00:00 -- 6 2 -- E R
long -- -- 336:00:0 -- 44 5 -- E R
slong -- -- 720:00:0 -- 0 0 -- E R
medium -- -- 72:00:00 -- 7 6 -- E R
----- -----
57 13
The output above is the output of "qstat -q" command. (You might be familiar with qstat command.) It shows that 4 queue are available (Status: R), there is 57 jobs are running and 13 jobs are waiting in queue.
See queue information
You can use the following commands for configuring a queue interactively.
qmgr -c ".....your action ....."
Most used options are
active
list
create
set
delete
For example, I use list option to show the information of queue short.
qmgr -c "list queue short"
Below portion is output.
Queue short
queue_type = Execution
Priority = 50
max_user_queuable = 24
total_jobs = 8
state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:6 Exiting:0
resources_max.ncpus = 256
resources_max.nodes = 8
resources_max.walltime = 24:00:00
resources_default.walltime = 24:00:00
mtime = Thu Jun 25 10:22:28 2015
resources_assigned.ncpus = 6
resources_assigned.nodect = 6
max_user_run = 24
enabled = True
started = True
It shows that total jobs are 8 jobs, Maximum of available CPU that you can specify in submit script is 256, maximum nodes for this queue are 8 nodes, wall-time is 24 hrs for each job, as well as maximum users can run job at the same time are 24 users.
Adding/removing compute node to queue manager under server
File called nodes in TORQUE_HOME/serv_priv/
lists the name of compute node that are connecting with server and also the number of available physical CPU cores. E.g.
compute-1 32
compute-2 32
compute-3 32
...
Changing Maximum CPU cores, maximum, node, and wall-time
First, you have to enable the disabled queue, set it true.
qmgr -c "set queue short enabled=true"
Set maximum physical CPU cores for a job.
qmgr -c "set queue short resources_max.ncpus=32"
Set maximum maximum or a job.
qmgr -c "set queue short resources_max.mem=256"
Set maximum total nodes.
qmgr -c "set queue short resources_max.nodect=16"
Set maximum wall-time.
qmgr -c "set queue short resources_default.walltime=3600"
Rangsiman Ketkaew