HPC Cluster Good Citizenship
The CWRU HPC cluster is designed to support the active work of dozens, even hundreds, of researchers at a time.
Key operations that promote or inhibit efficient resource usage can occur
within the compute nodes (cpu/gpu/mem operations);
when data input/ouput occurs (file system ops); and,
in highly-shared resources such as the login nodes, or the Slurm scheduler node.
The approach that we encourage is two-fold.
Have a good understanding of your own workflow, including how the software uses memory, cpu and gpus, and file transfers.
Develop operational practices that work with the HPC architecture.
Login nodes caution
Running jobs
Please DO NOT use the login node for running your jobs. The login nodes serve dozens of people at a time, and computational work would take too many resources for one person's use. See Job Scheduling for more information.
Compiling on a head node (limit '-j <>')
Compiling packages on the login nodes is sometimes necessary, as these nodes tend to have a more complete development environment. Lengthy builds should be done on compute nodes. In the case of those packages that support parallel builds, limit the number of processes to 4: e.g. make --jibs=4
Use caution polling the system
Understanding the state of jobs and the available cluster resources is an important aspect of managing your work. Sometimes, you will set up scripts to run to monitor conditions.
Minimize Stress on the File Systems
Efficient run-time data access: /scratch
At the beginning of a job, stage data to the $PFSDIR directory created by the Slurm scheduler. This is particularly important for data stored outside of HPC Storage. The RS and RDS file systems have slower disk access performance than HPC storage. In addition, network file access is slower. Jobs requiring significant file access compound the inherent inefficiencies -- staging data to HPC storage on /scratch is the best method to achieve efficient data handling.
Avoid many small files per directory
All of the general inefficiencies are made worse in cases where there are many small files in a directory. When you are able to control the file hierarchy, keep the number of files below approximately 10,000.
Minimize Stress on the internal Network
Limit I/O intensive sessions, avoid lots of reads and writes to disk, and/or rapidly opening or closing many files.
Avoid opening and closing files repeatedly in tight loops. Every open/close operation on the file system requires interaction with the metadata service in the parallel file system (Panasas or Qumulo). Overloading the MDS will affect other users on the system. If possible, open files once at the beginning of your program/workflow, then close them at the end.
Don't get greedy. If you know or suspect your workflow is I/O intensive, don't submit a pile of simultaneous jobs. Writing restart/snapshot files can stress the file system; avoid doing so too frequently. Also, use the hdf5 or netcdf libraries to generate a single restart file in parallel, rather than generating files from each process separately.
File transfer guidelines
Avoid too many simultaneous file transfers. Consider how many cpu-core are available on the transfer node (e.g. hpctransfer, dtn[1-3]) and be respectful of the multi-user environment.
Avoid recursive file transfers, which are those that descend through a file structure. Instead, prepare the file structure into an archive, using tar along with compression via gzip or bzip2.
Job submission tips
Test your submission scripts. Begin with jobs of short duration to test that each step of the job behaves properly and concludes successfully
Respect memory limits and other system constraints. Some code, especially downloaded from Github, is developed for the sole use of the machine that it will run on. That means it may look to system information to determine the total resources on the computer, and may attempt to use them all. Learn to constrain the job to the resources that you request through Slurm.
Request Only the Resources You Need. This does not mean to over-constrain your work. Rather, attempt to understand the resources that the jobs do require, and work with a modest buffer in memory, and limit to the number of cpu-cores that provide the useful tradeoff between performance and efficiency.
Thanks for reviewing these resource 'pain points', and realizing that the decisions of even a single person can adversely impact many others, preventing everyone from accomplishing their research goals.