Grid Engine for Job Scheduler (Open GE and SGE)

Some deployments of grid engine.

Sun Grid Engine (SGE)
Open Gird Engine

Command for control the job

Grid engine (GE) use $S flag to define the letter sequence in the script which indicates additional option for submitting the job.

-S <shell-platform>
- For example /bin/sh
-pe <parallel environment> [<number of cores>]
- Needed for executing parallel jobs. Allow to specify the number of cores
-cwd
- Uses the directory, where the job has been submitted, as the working directory. If -cwd is not set, home directory is used.
-C
-A <login-name>
- Defines the user account of the job owner. If not defined it falls back to the user who submitted the job.
-j y
- Merges the normal output of the file and any error messages into one file, typically with the name <job-name>.o<job-id>.
-m aes
- Types of notifications. Sun Grid Engine will notify the job owner by email if the job is either completed, suspended or aborted.
-M <email-address>
- The email address to where the notification is send.
-p 0
- The priority level of the submitted jobs. Jobs with a higher priority are preferred to be submitted to a node by the grid engine.
-r
- forces grid engine to restart the job in the case the system has a crash or is rebooted.
-N <job-name>
- Defines a short name for the job to identify it besides the job ID. If omitted the job name is the name of the shell script
-o <outputfile>
- Names the output file. If omitted the output filename is the defined by <job-name>.o<job-id>
-e <errorfile>
- Name the error file. If omitted the output filename is the defined by <job-name>.e<job-id>
-v <environment>
- Normally environment variables, defined in your .bash_profile or related file, are not exported to the node, where the job runs. With this option grid engine sets the environment variable prior to starting the job.

Check parallel environment (pe) of your SGE system.

Sun Grid Engine Queue Configuration, called qconf, allows the administrator to add, delete, modify the grid engine configuration.

Show all parallel environments:

$ qconf -spl

impi16ppn

impi28ppn

mpi

mpi-rr

mpi10ppn

mpi12ppn

mpi14ppn

mpi16ppn

mpi18ppn

mpi20ppn

mpi22ppn

mpi24ppn

mpi26ppn

mpi28ppn

mpi2ppn

mpi4ppn

mpi8ppn

mpich

mpifill

mpirr

mvapich

orte

smp

Show details of parallel environment, for example smp.

$ qconf -sp smp

pe_name            smp

slots              496

user_lists         NONE

xuser_lists        NONE

start_proc_args    /bin/true

stop_proc_args     /bin/true

allocation_rule    $pe_slots

control_slaves     FALSE

job_is_first_task  TRUE

urgency_slots      min

accounting_summary TRUE

Example of SGE script for running ORCA calculation

#!/bin/bash

#$ -S /bin/sh

#$ -cwd

#$ -j y

#$ -m be

#$ -M rangsiman1993@gmail.com

#$ -pe mpifill 28

#$ -l h_vmem=2G

#$ -l hostname=compute-0-0

#$ -V

#$ -o orca-log

#$ -e orca-error

export job="water-dlpno-ccsd-t"

module purge

module load gcc-7.2.0

# Setting OpenMPI

export PATH="/home/rangsiman/.openmpi/bin/":$PATH

export LD_LIBARY_PATH="/home/rangsiman/.openmpi/lib/":$LD_LIBARY_PATH

export OMP_NUM_THREADS=1

# Setting ORCa directory path

orcadir="/home/rangsiman/orca_4_1_0_linux_x86-64_openmpi313"

export PATH="$orcadir":$PATH

#ORCA=`which orca`

# Setting communication protocal

export RSH_COMMAND="/usr/bin/ssh -x"

# Creating local scratch folder for the user on the computing node.

# /lustre/$USER/scratch directory must exist.

if [ ! -d /lustre/$USER/scratch ]

then

  mkdir -p /lustre/$USER/scratch

fi

tdir=$(mktemp -d /lustre/$USER/scratch/orcajob__$JOB_ID-XXXX)

# Copy only the necessary stuff in submit directory to scratch directory.

# Add more here if needed.

cp $SGE_O_WORKDIR/${job}.inp $tdir/

cp $SGE_O_WORKDIR/*.gbw $tdir/

cp $SGE_O_WORKDIR/*.xyz $tdir/

# Creating nodefile in scratch

cat $PE_HOSTFILE > $tdir/${job}.nodes

# cd to scratch

cd $tdir

# Copy job and node info to beginning of outputfile

echo "Job execution start: $(date)" >> $SGE_O_WORKDIR/${job}.out

echo "Shared library path: $LD_LIBRARY_PATH" >> $SGE_O_WORKDIR/${job}.out

echo "SGE Job ID is      : ${JOB_ID}" >> $SGE_O_WORKDIR/${job}.out

echo "SGE Job name is    : ${JOB_NAME}" >> $SGE_O_WORKDIR/${job}.out

echo "" >> $SGE_O_WORKDIR/${job}.out

cat $PE_HOSTFILE >> $SGE_O_WORKDIR/${job}.out

# Start ORCA job. ORCA is started using full pathname (necessary for parallel execution).

# Output file is written directly to submit directory on frontnode.

$orcadir/orca $tdir/${job}.inp >> $SGE_O_WORKDIR/${job}.out

# ORCA has finished here. Now copy important stuff back (xyz files, GBW files etc.).

# Add more here if needed.

cp $tdir/*.gbw $SGE_O_WORKDIR

cp $tdir/*.xyz $SGE_O_WORKDIR

How to submit your job.

Just simply use command qsub

$ qsub your-sge-script.sh

Rangsiman Ketkaew

Google Sites

Report abuse