Recent site activity

Page authors

  • Jonathan Glines
    June 13, 2012
Compute Clusters‎ > ‎

Software

TORQUE

TORQUE is the software used for scheduling compute jobs on our clusters.

The following is an example TORQUE script that runs an MPI program on 5 nodes. This script can be scheduled with the qsub command. See the man page for the qsub command for information on what options go into TORQUE scripts. Programs must use MPI in order to run in parallel on multiple nodes on the cluster.

#!/bin/sh
#
#This is an example script for using NAMD with torque
#
#These commands set up the Grid Environment for your job:
#PBS -N TestNamdJob
#PBS -l nodes=5:ppn=16
#PBS -q default
#PBS -M doejane@isu.edu
#PBS -m abe

PROGRAM="/home/doejane/myProgram"
JOB_DIR="/home/doejane/myJobDir"

MPIEXEC="/opt/open-mpi/tcp-gnu41/bin/mpiexec"

cd $JOB_DIR
JOB="$MPIEXEC -n 80 $PROGRAM"

echo "starting job in 10 seconds: $JOB"

sleep 10

$JOB

MCNPX

The MCNPX software can be used to perform Monte Carlo physics simulations on our clusters. Because MCNPX is export controlled, in order to access this software you must provide proof of access and identification to the CoSE IT staff. Contact us at hpcchelp@isu.edu for more information.

Running an MCNPX Job

If you edited your MCNPX files on a Windows computer, they might have DOS line endings. It is important to have UNIX line endings when using MCNPX on the cluster. Your MCNPX files can to be converted to UNIX line endings with the following command:

$ dos2unix file01 file02 file03

Make sure to move/remove any existing data files (the files that end with o, r, and m) from old jobs because these will break MCNPX. Then execute the following command to queue your jobs:

$ /opt/mcnpx/qmcnpx file01 file02 file03

Run the qstat command to check your running jobs. To kill jobs that have hung or were started by mistake, run the qdel command followed by the ID of the job you want to kill.

MCNPX has a habit of getting stuck in a loop when a job breaks or wasn't started properly. If your job isn't updating its output files and appears to be frozen, please kill it so that it doesn't waste cluster resources.
Comments