Recent site activity

Home‎ > ‎

May 2011 Upgrade

Update in Jan 2014: This is an old document. Currently the login node is hpclogin.case.edu

Information Technology Services (ITS) significantly updated its High Performance Computing (HPC) cluster infrastructure to provide more online storage, faster computing, and increased energy efficiency. The HPC cluster consists of a collection of servers functioning as a single computing system. The cluster provides users with the ability to run parallel computational tasks requiring hundreds of simultaneous processors in support of their research projects. High-speed parallel disk storage was increased from 20 Terabytes to 70 Terabytes. The upgrade more than doubles overall computational performance to 12 trillion floating-point operations per second while sustaining a mild 12% increase in energy utilization for power and cooling.

 Researchers in over 31 departments use the HPC cluster, with the largest use coming from Biochemistry, Physiology and Biophysics, Biomedical Engineering, Physics, and Chemistry. Dr. Calvetti and Dr. Turc in the Department of Mathematics require students in Applied Math classes to use the HPC cluster to learn how to do scientific computing. Dr. Mihos in the Department of Astronomy uses the HPC resource to simulate the evolution of galaxies and galaxy clusters. Dr. Shoham’s research in the Department of Biochemistry uses the HPC cluster to focus on drug discovery and drug screening to combat infectious diseases, heart disease, and cancer. Dr. Sichun Yang recently chose to join the Center for Proteomics and Bioinformatics in part because he was able to use the HPC resource on his first day at CWRU and avoid months of non-productive time setting up his own computational facility for his research.

 Find out more about the HPC resource by visiting links in our website.



Upgraded Cluster Architecture






What are the differences you should be aware of?

 Activities Old Cluster
 Upgraded Cluster or New Cluster
 Login to HPCC
 ssh -X caseID@hpcc-login.case.edu
 ssh -X caseID@arc-login.case.edu
 Auto-Completion
 module load <module-name>
 module load <...type a letter and see the list>
(Relegate the need for command: module avail)
Temporary Directory
 $TMPDIR=~/<jobid>.master2.priv.cwru.edu.OU
(Files are deleted as soon as the outputs are available at your working DIR)
$TMPDIR=/tmp/pbstmp.<jobid>.mast.priv.cwru.edu
(same as the Old Cluster)
$PFSDIR=/scratch/pbsjobs/pbstmp.<jobid>.mast.priv.cwru.edu
(Files remains in $PFSDIR for 14 days)
 Huge Scratch Space
 NOT Available
 provides huge scratch space ($PFSDIR) even to accommodate large output files without compromising the speed
View Temporary files when the jobs are active
 vi ~/<jobid>.master2.priv.cwru.edu.OUqpeek <jobid>

 PBS Script
 

# copy to local hard drive

pbsdcp -s apoa1.namd apoa1.pdb apoa1.psf *.xplor $TMPDIR

cd $TMPDIR


# retrieve results

pbsdcp -g '*' $PBS_O_WORKDIR

cd $PBS_O_WORKDIR
(Option 1)

# copy to local hard drive

pbsdcp -s inputfile1 inputfile2 accessoryfile* $TMPDIR

cd $TMPDIR


# retrieve results

pbsdcp -g '*' $PBS_O_WORKDIR

cd $PBS_O_WORKDIR



(Option 2)

# copy to the parallel file system’s scratch directory

cp inputfile1 inputfile2 accessoryfile* $PFSDIR

cd $PFSDIR


# retrieve results

cp * $PBS_O_WORKDIR

cd $PBS_O_WORKDIR





Temporary output files in details:


In the old cluster, users can access the temporary output files (*.OU and *.ER) in their own home directory. With this upgrade, we try to move away from this setup because this requires continuous writing of the temporary output files between the compute nodes and the home directory. While this can be a tiny file, it can also be a big file, so instead, these temporary files are stored locally on the compute nodes.

In the new cluster, users then can use qpeek <jobid> to access these local temporary output files (*.OU and *.ER).
$ qpeek 2967
comp073      <- host node
Mon May 16 13:00:05 EDT 2011    <- starting the job time

Only the person who is running the job can run this command successfully.
If the job is already finished, qpeek will give out an error.

If you need the output file page-by-page: qpeek <jobid> | more
If you want to save the output file, you can do file output redirect: qpeek <jobid> > mytempfile

Your other files will be located in your $PFSDIR or $TMPDIR depending on where you copy the files and your run location. If somehow you do not copy your files to a different location, your files will be located in your working directory $PBS_O_WORKDIR.

As a refresher, these are the locations of the temporary directories:
$PFSDIR=/scratch/pbsjobs/pbstmp.<jobid>.mast    from any nodes
$TMPDIR=/tmp/pbstmp.<jobid>.mast                     on the first local compute node where the job is running

Comments