HPC Cluster Good Citizenship

The CWRU HPC cluster is designed to support the active work of dozens, even hundreds, of researchers at a time


Key operations that promote or inhibit efficient resource usage can occur


The approach that we encourage is two-fold.  

Login nodes caution

Running jobs

Please DO NOT use the login node for running your jobs. The login nodes serve dozens of people at a time, and computational work would take too many resources for one person's use.  See Job Scheduling for more information.

Compiling on a head node (limit '-j <>')

Compiling packages on the login nodes is sometimes necessary, as these nodes tend to have a more complete development environment. Lengthy builds should be done on compute nodes. In the case of those packages that support parallel builds, limit the number of processes to 4:  e.g.  make --jibs=4

Use caution polling the system

Understanding the state of jobs and the available cluster resources is an important aspect of managing your work. Sometimes, you will set up scripts to run to monitor conditions.

Minimize Stress on the File Systems

Efficient run-time data access:  /scratch

At the beginning of a job, stage data to the $PFSDIR directory created by the Slurm scheduler. This is particularly important for data stored outside of HPC Storage. The RS and RDS file systems have slower disk access performance than HPC storage. In addition, network file access is slower. Jobs requiring significant file access compound the inherent inefficiencies -- staging data to HPC storage on /scratch is the best method to achieve efficient data handling.  

Avoid many small files per directory

All of the general inefficiencies are made worse in cases where there are many small files in a directory. When you are able to control the file hierarchy, keep the number of files below approximately 10,000.

Minimize Stress on the internal Network

File transfer guidelines

Job submission tips

Thanks for reviewing these resource 'pain points', and realizing that the decisions of even a single person can adversely impact many others, preventing everyone from accomplishing their research goals.