Grid Engine for Job Scheduler (Open GE and SGE)

Grid Engine for Job Scheduler (Open GE and SGE)

Some deployments of grid engine.

  • Sun Grid Engine (SGE)
  • Open Gird Engine


Command for control the job

Grid engine (GE) use $S flag to define the letter sequence in the script which indicates additional option for submitting the job.

  • -S <shell-platform>
    • For example /bin/sh
  • -pe <parallel environment> [<number of cores>]
    • Needed for executing parallel jobs. Allow to specify the number of cores
  • -cwd
    • Uses the directory, where the job has been submitted, as the working directory. If -cwd is not set, home directory is used.
  • -C
  • -A <login-name>
    • Defines the user account of the job owner. If not defined it falls back to the user who submitted the job.
  • -j y
    • Merges the normal output of the file and any error messages into one file, typically with the name <job-name>.o<job-id>.
  • -m aes
    • Types of notifications. Sun Grid Engine will notify the job owner by email if the job is either completed, suspended or aborted.
  • -M <email-address>
    • The email address to where the notification is send.
  • -p 0
    • The priority level of the submitted jobs. Jobs with a higher priority are preferred to be submitted to a node by the grid engine.
  • -r
    • forces grid engine to restart the job in the case the system has a crash or is rebooted.
  • -N <job-name>
    • Defines a short name for the job to identify it besides the job ID. If omitted the job name is the name of the shell script
  • -o <outputfile>
    • Names the output file. If omitted the output filename is the defined by <job-name>.o<job-id>
  • -e <errorfile>
    • Name the error file. If omitted the output filename is the defined by <job-name>.e<job-id>
  • -v <environment>
    • Normally environment variables, defined in your .bash_profile or related file, are not exported to the node, where the job runs. With this option grid engine sets the environment variable prior to starting the job.

Check parallel environment (pe) of your SGE system.

Sun Grid Engine Queue Configuration, called qconf, allows the administrator to add, delete, modify the grid engine configuration.

Show all parallel environments:

$ qconf -spl
impi16ppn
impi28ppn
mpi
mpi-rr
mpi10ppn
mpi12ppn
mpi14ppn
mpi16ppn
mpi18ppn
mpi20ppn
mpi22ppn
mpi24ppn
mpi26ppn
mpi28ppn
mpi2ppn
mpi4ppn
mpi8ppn
mpich
mpifill
mpirr
mvapich
orte
smp

Show details of parallel environment, for example smp.

$ qconf -sp smp
pe_name            smp
slots              496
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary TRUE

Example of SGE script for running ORCA calculation

#!/bin/bash

#$ -S /bin/sh
#$ -cwd
#$ -j y
#$ -m be
#$ -M rangsiman1993@gmail.com
#$ -pe mpifill 28
#$ -l h_vmem=2G
#$ -l hostname=compute-0-0
#$ -V
#$ -o orca-log
#$ -e orca-error

export job="water-dlpno-ccsd-t"

module purge
module load gcc-7.2.0

# Setting OpenMPI
export PATH="/home/rangsiman/.openmpi/bin/":$PATH
export LD_LIBARY_PATH="/home/rangsiman/.openmpi/lib/":$LD_LIBARY_PATH
export OMP_NUM_THREADS=1

# Setting ORCa directory path
orcadir="/home/rangsiman/orca_4_1_0_linux_x86-64_openmpi313"
export PATH="$orcadir":$PATH
#ORCA=`which orca`

# Setting communication protocal
export RSH_COMMAND="/usr/bin/ssh -x"

# Creating local scratch folder for the user on the computing node.
# /lustre/$USER/scratch directory must exist.
if [ ! -d /lustre/$USER/scratch ]
then
  mkdir -p /lustre/$USER/scratch
fi
tdir=$(mktemp -d /lustre/$USER/scratch/orcajob__$JOB_ID-XXXX)

# Copy only the necessary stuff in submit directory to scratch directory.
# Add more here if needed.
cp $SGE_O_WORKDIR/${job}.inp $tdir/
cp $SGE_O_WORKDIR/*.gbw $tdir/
cp $SGE_O_WORKDIR/*.xyz $tdir/

# Creating nodefile in scratch
cat $PE_HOSTFILE > $tdir/${job}.nodes

# cd to scratch
cd $tdir

# Copy job and node info to beginning of outputfile
echo "Job execution start: $(date)" >> $SGE_O_WORKDIR/${job}.out
echo "Shared library path: $LD_LIBRARY_PATH" >> $SGE_O_WORKDIR/${job}.out
echo "SGE Job ID is      : ${JOB_ID}" >> $SGE_O_WORKDIR/${job}.out
echo "SGE Job name is    : ${JOB_NAME}" >> $SGE_O_WORKDIR/${job}.out
echo "" >> $SGE_O_WORKDIR/${job}.out

cat $PE_HOSTFILE >> $SGE_O_WORKDIR/${job}.out

# Start ORCA job. ORCA is started using full pathname (necessary for parallel execution).
# Output file is written directly to submit directory on frontnode.
$orcadir/orca $tdir/${job}.inp >> $SGE_O_WORKDIR/${job}.out

# ORCA has finished here. Now copy important stuff back (xyz files, GBW files etc.).
# Add more here if needed.
cp $tdir/*.gbw $SGE_O_WORKDIR
cp $tdir/*.xyz $SGE_O_WORKDIR

How to submit your job.

Just simply use command qsub

$ qsub your-sge-script.sh


Rangsiman Ketkaew