Configuring Server Policy for TORQUE and PBS Pro

Configuring Server Policy for TORQUE and PBS Pro

Workload manager allows administrator (root) or approved user (sudo) to set up server policy and queue management. Linux cluster whose job scheduling manager are submitted to run on, such as Torque or PBS, can create or modify or delete queue policy. For example, setting up the limit of wall-time and maximum number of CPU cores.

Let's do it !

To do this, qmgr command is all you need! A qmgr utility is a queue manager used to configure queue system. Sadly, there is no manual of man and info for this command, but, qmgr has a built-in manual. Open its help page by typing help in qmgr interactive program.

The following commands are to open qmgr's manual.

[rangsiman@nitrogen ~]$ qmgr
Max open servers: 4
Qmgr: hi
qmgr: Illegal operation: hi
Try 'help' if you are having trouble.
Qmgr:
Qmgr: help
General syntax: command [object][@server] [name attribute[.resource] OP value]
To get help on any topic or subtopic, type help <topic>
Help is available on all commands and topics.
Available commands: active, create, delete, set, unset, list, print, quit
Other topics are attributes, operators, names, and values .

Qmgr:
Qmgr: help create
Syntax: create object name[,name...]
Objects can be "queue" or "node"
The create command will create the specified object on the PBS server(s).
For multiple names, use a comma separated list with no intervening whitespace.
Examples:
create queue q1,q2,q3

Qmgr:
Qmgr:

If you make a typo or wrong command in qmgr, it will show you like 'you can type help to get a manual of available option'.


Create a queue

The following series of qmgr command will create and configure a queue named default step-by-step.

qmgr -c "create queue default queue_type=execution"
qmgr -c "set queue default started=true"
qmgr -c "set queue default enabled=true"
qmgr -c "set queue default resources_default.nodes=1"
qmgr -c "set queue default resources_default.walltime=3600"

This queue is ready to get a new job. As default setting, if no any environment requests, the queue will assign your job to run on 1 node with wall-time 3600 second automatically.


Summary of all queue.

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
short              --      --    24:00:00   --    6   2 --   E R
long               --      --    336:00:0   --   44   5 --   E R
slong              --      --    720:00:0   --    0   0 --   E R
medium             --      --    72:00:00   --    7   6 --   E R
                                               ----- -----
                                                  57    13

The output above is the output of "qstat -q" command. (You might be familiar with qstat command.) It shows that 4 queue are available (Status: R), there is 57 jobs are running and 13 jobs are waiting in queue.


See queue information

You can use the following commands for configuring a queue interactively.

qmgr -c ".....your action ....."

Most used options are

active
list
create 
set
delete

For example, I use list option to show the information of queue short.

qmgr -c "list queue short"

Below portion is output.

Queue short
        queue_type = Execution
        Priority = 50
        max_user_queuable = 24
        total_jobs = 8
        state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:6 Exiting:0
        resources_max.ncpus = 256
        resources_max.nodes = 8
        resources_max.walltime = 24:00:00
        resources_default.walltime = 24:00:00
        mtime = Thu Jun 25 10:22:28 2015
        resources_assigned.ncpus = 6
        resources_assigned.nodect = 6
        max_user_run = 24
        enabled = True
        started = True

It shows that total jobs are 8 jobs, Maximum of available CPU that you can specify in submit script is 256, maximum nodes for this queue are 8 nodes, wall-time is 24 hrs for each job, as well as maximum users can run job at the same time are 24 users.


Adding/removing compute node to queue manager under server

File called nodes in TORQUE_HOME/serv_priv/ lists the name of compute node that are connecting with server and also the number of available physical CPU cores. E.g.

compute-1   32
compute-2   32
compute-3   32
...


Changing Maximum CPU cores, maximum, node, and wall-time

First, you have to enable the disabled queue, set it true.

qmgr -c "set queue short enabled=true"

Set maximum physical CPU cores for a job.

qmgr -c "set queue short resources_max.ncpus=32" 

Set maximum maximum or a job.

qmgr -c "set queue short resources_max.mem=256" 

Set maximum total nodes.

qmgr -c "set queue short resources_max.nodect=16" 

Set maximum wall-time.

qmgr -c "set queue short resources_default.walltime=3600" 


Rangsiman Ketkaew