Coeus HPC Cluster
Getting Started
Gaining Access and Connecting
Request an account
The Coeus cluster is a grant-funded resource provided through the Portland Institute for Computational Science and Portland State University.
First, you will need to make sure your compute jobs are either capable of parallelism or require significant HPC resources (as opposed to general linux compute servers). To request cluster access, use this form.
Connecting and Logging in
Command Line Interface
If you are not accustomed to using a Linux command line interface (CLI), we recommend familiarizing yourself with introductory material such as this book, https://sourceforge.net/projects/linuxcommand/files/TLCL/19.01/TLCL-19.01.pdf/download
or http://www.pcworld.com/article/214370/12_commands_every_linux_newbie_should_learn.html. The ability to navigate and manage files at the Linux command line is important in order to work effectively.
Secure Shell (SSH) client
To connect to these servers you will need to use Secure Shell (ssh) run through a terminal emulation client application. If you use Linux or MacOS X terminal applications are included with the operating system. Windows users will need to download a client such as PuTTY. Secure Shell is a standard, encrypted means of connecting to remote servers. This OIT FAQ Secure Shell (SSH) explains how to connect on Windows and OSX. These clients give you access to the Linux Command Line Interface (CLI).
> ssh odinID@login1.coeus.rc.pdx.edu
> ssh odinID@login2.coeus.rc.pdx.edu
File Transfer with sFTP or SCP
To move files to OIT-RC Linux systems you will have to use a secure File Transfer protocol such as sFTP, scp, or rsync. There are many free graphical client programs such as WinSCP (compatible with PuTTY), Fugu for OSX, and CyberDuck and FileZilla for OSX and Windows. scp and sFTP can be used from the linux and OSX CLI as well.
X Server
If you require a graphical interface (for example, to run MatLab with the graphic interface) you will need an X server. There are excellent free X servers, such as XQuartz for OSX and Xming and MobaXterm for Windows. Linux distributions will have native support, but you may need to install the proper packages and enable and configure the X Window System.
To login to the Linux systems with the default Linux X Server, add the "-X" option to the end of a ssh command. To test this on Coeus, once logged in, type "xclock" to open a clock in a graphical interface.
> ssh odinID@login1.coeus.rc.pdx.edu -X
> xclock
Remote access to login nodes
Direct ssh access to the login nodes is limited to on PSU campus IP range (i.e. doesn’t include the guest wireless).
Use campus VPN. This is the considered the most secure method of off-campus access. OIT provides FAQs for installing and configuring the campus VPN. This requires using 2-factor authentication.
Your First Login
Automatic Environment Setup
IMPORTANT: When you first log in to the Coeus cluster, your home directory will be automatically generated. You will be guided through creating a proper environment for compiling and running a parallel program. Answer "yes" at the prompts. In the event that you say “no” or if you don’t get prompted with a setup prompt, the setup script needs to be run again. You should run
> touch ~/.actrun
then logout and then back in.
This setup creates an SSH key to connect with cluster nodes for passwordless communications, adds /act/bin to the PATH variable, and adds the module command to the user's environment.
Operating Environment
Coeus Home Directory (homedir - /home/odinid)
Your coeus home directory is separate from the general research home directory (used for other systems in the PSU research computing infrastructure). Separate home directories are used because different computational systems often require different local system settings. Your coeus home directory will have the configuration files noted in the previous section, as well as any cluster-specific, custom settings you add. For more information on /home/ and other file systems on Coeus, refer to the section on Filesystems and data storage below.
Login nodes
These are the servers where the users interface with the file system, scheduler, and other tools. The coeus login nodes are named:
login1.coeus.rc.pdx.edu
login2.coeus.rc.pdx.edu
Important! Do not run long computational jobs on the login servers. These are for logging in, accessing your home directory, accessing file systems, writing and editing files, compressing and uncompressing data sets, compiling software, scheduling computational jobs, testing software, etc. Computational jobs will be run on computational nodes, through the SLURM job scheduler. Long computational process running on login nodes, and any unscheduled jobs, are liable to be terminated without notification.
Modules
This cluster uses Linux environment modules to allow users to quickly update their environment, including execution paths, library paths, and manual paths, for specific software packages. This will allow users to enable and disable software as needed. For example, the Coeus cluster has module environments created for each MPI available implementation (openmpi, mpich, mvapich).
Basic module usage
To obtain a complete list of all modules currently available on the system
> module avail
To load a module, e.g. GCC 6.3.0 compilers
> module load gcc-6.3.0
To load a module, e.g. MVAPICH 2.2.2 compiled with GCC 6.3.0. (this will automatically load the gcc-6.3.0 module)
> module load mvapich2-2.2/gcc-6.3.0
To obtain a complete list of currently loaded modules
> module list
Currently Loaded Modulefiles:
1) gcc-6.3.0 2) mvapich2-2.2/gcc-6.3.0
To unload a module, e.g. MVAPICH 2.2.2 compiled with GCC 6.3.0. (this will automatically unload the gcc-6.3.0 module, too)
> module unload mvapich2-2.2/gcc-6.3.0
NERSC has an excellent Modules usage reference
Software
Modules load the selected software on each of these systems, mounted in the /vol/apps/hpc volume, where there is broad range of available software. Some software in this volume include:
GCC 6.3.0 with earlier versions available
Python 3.6 and 2.7 versions with typical libraries such as numpy, scipy.
Blast
Matlab, R, SAS
Latest versions of HDF5, NetCDF4, zlib, cmake
File Systems and Data Storage
Coeus Home Directory. /home/odinid
Your home directory is on a shared filesystem that is mounted on all cluster nodes. This should be used to store your batch scripts, system configurations, local compiled software, libraries, and config/settings files. Home directories are backed-up to tape on a nightly basis. Be advised that running calculations on data living in your home directory will be much slower, just use it to store backups of your data and do the computation on scratch storage.
Scratch storage. /scratch
Data for your computational work should be put in scratch. You can create your own personal and group project folders here. This shared filesystem is mounted on all cluster nodes. This is a large volume intended for temporary storage of data used in computational processes. This volume is not backed up and all files stored here are considered to be temporary.
Scratch is managed with on a modified First In, First Out policy. The largest consumers of storage are prioritized for deletion and the oldest files are removed first. Once this volume reaches a certain threshold, you may be asked to remove directories/files. If this passes a critical threshold, system administrators reserve the right to remove all files.
Other Volumes
Applications Volume. /vol/apps/
Common applications are stored in /vol/apps/hpc/stow. This is mounted on all cluster nodes. This is the same applications volume as other OIT-RC systems. This includes commonly used software such as R and Matlab, as well as variety of other tools for Bioinformatics, Genetics, GIS, all of which are loaded using modules. This is a read-only volume.
GCC compiler versions can be found in /vol/apps/gcc/ and Python versions /vol/apps/python/.
Research shares. /vol/share/sharename
Research storage shares are common to all OIT-RC systems. These are only mounted on this cluster login nodes, in order to facilitate copying of data to the /scratch volumes. /vol/share is a good place to move data that should be backed up, for example resultant data from computational runs. Do not run computational jobs against data stored on /vol/share. This volume is backed up. (PSU access only)
Workspace scratch storage. /vol/workspace
This scratch volume is common to multiple OIT-RC computational systems. It is only mounted on Coeus login nodes, in order to facilitate copying of data to the /scratch volume. Computational work is not allowed on login nodes. Do not run computational processes against data stored on /workspace This volume is not backed up and all files stored here are considered to be temporary.
Running Parallel Programs
SLURM Workload Manager
We use the SLURM Workload Manager for job control and management. There are a number of user commands for the scheduler. For getting started, the most salient commands are sbatch, squeue, scancel, sinfo, and srun. A sample submit script and use of some of these commands is included in the section below “Compiling A Simple MPI Program.” For more information visit the SLURM Quick Start User Guide. This is a good, more detailed introduction to SLURM.
sbatch - Command to submit a job script to the scheduler for execution. This script typically contains one or more srun commands to launch parallel tasks.
squeue - This reports the state of jobs or job steps. This is useful view check what’s in the current job queue, especially if you’re going to submit a larger job using many nodes.
scancel - Allows you to cancel a pending or running job.
sinfo - This reports the state of partitions and nodes managed by Slurm. There are a number of filtering, sorting, and formatting options.
srun - This command is used to submit a job for execution or initiate job steps in real time. Typically this will be included in an sbatch script.
For more on SLURM parallelism, visit here.
For more on the SLURM Scheduler, refer to this page.
Partitions
There are many ways of dividing up and managing a cluster. Partitions are a means of dividing hardware and nodes into useful groupings. These hardware groups can have very different parameters assigned to them. Currently Coeus is divided into three general CPU node partitions, one aggregate CPU partition, a Intel Phi processor partition, a large memory partition (with GPUs), and a GPU partition. Note that these partitions and parameters may change in the future as demand requires.
short - jobs are limited to 4 hours.
medium - this is the default partition. If you don’t specify a partition, your job will run here. Jobs are limited to 7 days.
long - allows long running jobs up to 20 days.
interactive - allows interactive jobs. This can be useful for remote visualization tasks and interactive applications. Jobs can be up to 2 days.
himem - large memory nodes with Tesla V100 GPUs. Jobs can be up to 20 days.
phi - phi processor nodes. Jobs can be up to 20 days.
gpu - nodes with A40 or RTX A5000 GPUs. Jobs can be up to 20 days.
The sinfo command will display an overview of partitions.
Compiling A Simple MPI Program
This is an example session where a simple MPI “Hello World” program is compiled and run. This assumes this program file named mpi_hello.c, uses the mpich MPI library, the submission script is mpi_hello_submit.sh, the job is submitted to the “short” partition, and the output goes to a file named mpi_hello.txt.
The program file - mpi_hello.c.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char ** argv) {
int rank, size;
char name[80];
int length;
MPI_Init(&argc, &argv); // note that argc and argv are passed
// by address
//
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Get_processor_name(name,&length);
printf("Hello MPI: processor %d of %d on %s\n", rank,size,name);
MPI_Finalize();
}
To compile the program mpi_hello (assuming you have created the sample program)
$ module load openmpi-3.0.1/gcc-9.2.0
$ mpicc -o mpi_hello mpi_hello.c
Scheduler submission script - submit_mpi_hello.sh."
#!/bin/bash
#SBATCH --job-name mpi_hello
#SBATCH --nodes 2
#SBATCH --ntasks-per-node 2
#SBATCH --partition short
#SBATCH --output mpi_hello.txt
module load openmpi-3.0.1/gcc-9.2.0
mpiexec ./mpi_hello
# run sleep for 20 sec. so we can test the 'squeue' command
srun sleep 20
Submit the program mpi_hello to the SLURM scheduler (assuming you have created the sample program and submit script)
$ sbatch submit_mpi_hello.sh
The “squeue” command should now show a running job.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
348 short mpi_hell will R 0:14 2 compute[127-128]
After this runs, listing the directory contents should now has a the C code file, program, submission script, and output file.
$ ls
mpi_hello mpi_hello.c mpi_hello_submit.sh mpi_hello.txt
The output file will show the nodes and cores that it ran on.
$ cat mpi_hello.txt
Hello MPI: processor 0 of 4 on compute127.cluster
Hello MPI: processor 1 of 4 on compute127.cluster
Hello MPI: processor 2 of 4 on compute128.cluster
Hello MPI: processor 3 of 4 on compute128.cluster
If a job does a lot of the same thing, like run the same calculations on different inputs, it is highly recommended to use a job array.
For more examples of SBATCH scripts, please refer to the SLURM Scheduler page.
Coeus Priority Access
In addition to the Free access tier, there is now a Priority access tier making it possible for researchers to reserve dedicated computer time for their funded research needs. Details are available in OIT’s description of the High Performance Computing (HPC) Clusters service, including a link to the HPC Priority Access request form where researchers can engage with OIT to assess their HPC requirements in order to include funding for Priority access in future research grant proposals.
High Priority Partitions
After your request has been processed, you will be able to submit the job to the higher priority partitions. Jobs submitted to these partitions will preempt the jobs in regular tier. More details on the node specification can be found here. Maximum job runtime on these partition is 20 days. Send a request to help-rc@pdx.edu if you need to extended the runtime of your job beyond the maximum time limit.
priority_access - 130 compute nodes
priority_access_himem - 2 himem nodes
priority_acces_gpu - 10 GPU nodes
Submit a High Priority Job
To submit a job to a high priority partition following SLURM parameter has to be passed, for regular compute nodes:
--partition priority_access
for himem nodes:
--partition priority_access_himem
or for gpu jobs:
--partition priority_access_gpu