The Coeus cluster is a grant-funded resource provided through the Portland Institute for Computational Science and Portland State University.
First, you will need to make sure your compute jobs are either capable of parallelism or require significant HPC resources (as opposed to general linux compute servers). To request cluster access, use this form.
If you are not accustomed to using a Linux command line interface (CLI), we recommend familiarizing yourself with introductory material such as this book, https://sourceforge.net/projects/linuxcommand/files/TLCL/19.01/TLCL-19.01.pdf/download
or http://www.pcworld.com/article/214370/12_commands_every_linux_newbie_should_learn.html. The ability to navigate and manage files at the Linux command line is important in order to work effectively.
To connect to these servers you will need to use Secure Shell (ssh) run through a terminal emulation client application. If you use Linux or MacOS X terminal applications are included with the operating system. Windows users will need to download a client such as PuTTY. Secure Shell is a standard, encrypted means of connecting to remote servers. This OIT FAQ Secure Shell (SSH) explains how to connect on Windows and OSX. These clients give you access to the Linux Command Line Interface (CLI).
> ssh odinID@login1.coeus.rc.pdx.edu
> ssh odinID@login2.coeus.rc.pdx.edu
To move files to OIT-RC Linux systems you will have to use a secure File Transfer protocol such as sFTP, scp, or rsync. There are many free graphical client programs such as WinSCP (compatible with PuTTY), Fugu for OSX, and CyberDuck and FileZilla for OSX and Windows. scp and sFTP can be used from the linux and OSX CLI as well.
If you require a graphical interface (for example, to run MatLab with the graphic interface) you will need an X server. There are excellent free X servers, such as XQuartz for OSX and Xming and MobaXterm for Windows. Linux distributions will have native support, but you may need to install the proper packages and enable and configure the X Window System.
To login to the Linux systems with the default Linux X Server, add the "-XC" option to the end of a ssh command. To test this on Coeus, once logged in, type "xclock" to open a clock in a graphical interface.
> ssh odinID@login1.coeus.rc.pdx.edu -XC
> xclock
Direct ssh access to the login nodes is limited to on PSU campus IP range (i.e. doesn’t include the guest wireless).
Use campus VPN. This is the considered the most secure method of off-campus access. OIT provides FAQs for installing and configuring the campus VPN. This requires using 2-factor authentication.
Your coeus home directory is separate from the general research home directory (used for other systems in the PSU research computing infrastructure). Separate home directories are used because different computational systems often require different local system settings. Your coeus home directory will have the configuration files noted in the previous section, as well as any cluster-specific, custom settings you add. For more information on /home/ and other file systems on Coeus, refer to the section on Filesystems and data storage below.
These are the servers where the users interface with the file system, scheduler, and other tools. The coeus login nodes are named:
login1.coeus.rc.pdx.edu
login2.coeus.rc.pdx.edu
Important! Do not run long computational jobs on the login servers. These are for logging in, accessing your home directory, accessing file systems, writing and editing files, compressing and uncompressing data sets, compiling software, scheduling computational jobs, testing software, etc. Computational jobs will be run on computational nodes, through the SLURM job scheduler. Long computational process running on login nodes, and any unscheduled jobs, are liable to be terminated without notification.
This cluster uses Linux environment modules to allow users to quickly update their environment, including execution paths, library paths, and manual paths, for specific software packages. This will allow users to enable and disable software as needed. For example, the Coeus cluster has module environments created for each MPI available implementation (openmpi, mpich, mvapich).
To obtain a complete list of all modules currently available on the system
> module avail
To load a module, e.g. GCC 13.2.0 compilers
> module load gcc/13.2.0
To load a module, e.g. OpenMPI 4.1.4 compiled with GCC 13.2.0. (this will automatically load the gcc-13.2.0 module)
> module load openmpi/4.1.4-gcc-13.2.0
To obtain a complete list of currently loaded modules
> module list
Currently Loaded Modulefiles:
1) gcc/13.2.0 2) openmpi/4.1.4-gcc-13.2.0
To unload a module, e.g. OpenMPI 4.1.4 compiled with GCC 13.2.0. (this will automatically unload the gcc-13.2.0 module, too)
> module unload openmpi/4.1.4-gcc-13.2.0
NERSC has an excellent Modules usage reference
Modules load the selected software on each of these systems, mounted in the /software volume, where there is broad range of available software. Some software in this volume include:
GCC 13.2.0 with earlier versions available
Python 3.6 and 2.7 versions with typical libraries such as numpy, scipy.
Blast
Matlab, R, SAS
Latest versions of HDF5, NetCDF4, zlib, cmake
Your home directory is on a shared filesystem that is mounted on all cluster nodes. This should be used to store your batch scripts, system configurations, local compiled software, libraries, and config/settings files. Home directories are backed-up to tape on a nightly basis. Be advised that running calculations on data living in your home directory will be much slower, just use it to store backups of your data and do the computation on scratch storage.
Data for your computational work should be put in scratch. You can create your own personal and group project folders here. This shared filesystem is mounted on all cluster nodes. This is a large volume intended for temporary storage of data used in computational processes. This volume is not backed up and all files stored here are considered to be temporary.
Scratch is managed with on a modified First In, First Out policy. The largest consumers of storage are prioritized for deletion and the oldest files are removed first. Once this volume reaches a certain threshold, you may be asked to remove directories/files. If this passes a critical threshold, system administrators reserve the right to remove all files.
Research storage shares are common to all OIT-RC systems. These are only mounted on this cluster login nodes, in order to facilitate copying of data to the /scratch volumes. /vol/share is a good place to move data that should be backed up, for example resultant data from computational runs. Do not run computational jobs against data stored on /vol/share. This volume is backed up. (PSU access only)
We use the SLURM Workload Manager for job control and management. There are a number of user commands for the scheduler. For getting started, the most salient commands are sbatch, squeue, scancel, sinfo, and srun. A sample submit script and use of some of these commands is included in the section below “Compiling A Simple MPI Program.” For more information visit the SLURM Quick Start User Guide. This is a good, more detailed introduction to SLURM.
sbatch - Command to submit a job script to the scheduler for execution. This script typically contains one or more srun commands to launch parallel tasks.
squeue - This reports the state of jobs or job steps. This is useful view check what’s in the current job queue, especially if you’re going to submit a larger job using many nodes.
scancel - Allows you to cancel a pending or running job.
sinfo - This reports the state of partitions and nodes managed by Slurm. There are a number of filtering, sorting, and formatting options.
srun - This command is used to submit a job for execution or initiate job steps in real time. Typically this will be included in an sbatch script.
For more on SLURM parallelism, visit here.
For more on the SLURM Scheduler, refer to this page.
There are many ways of dividing up and managing a cluster. Partitions are a means of dividing hardware and nodes into useful groupings. These hardware groups can have very different parameters assigned to them. Currently Coeus is divided into three general CPU node partitions, one aggregate CPU partition, a Intel Phi processor partition, a large memory partition (with GPUs), and a GPU partition. Note that these partitions and parameters may change in the future as demand requires.
short - jobs are limited to 4 hours.
medium - this is the default partition. If you don’t specify a partition, your job will run here. Jobs are limited to 7 days.
long - allows long running jobs up to 20 days.
interactive - allows interactive jobs. This can be useful for remote visualization tasks and interactive applications. Jobs can be up to 2 days.
himem - large memory nodes with Tesla V100 GPUs. Jobs can be up to 20 days.
phi - phi processor nodes. Jobs can be up to 20 days.
gpu - nodes with A40 or RTX A5000 GPUs. Jobs can be up to 20 days.
The sinfo command will display an overview of partitions.
This is an example session where a simple MPI “Hello World” program is compiled and run. This assumes this program file named mpi_hello.c, uses the mpich MPI library, the submission script is mpi_hello_submit.sh, the job is submitted to the “short” partition, and the output goes to a file named mpi_hello.txt.
The program file - mpi_hello.c.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char ** argv) {
int rank, size;
char name[80];
int length;
MPI_Init(&argc, &argv); // note that argc and argv are passed
// by address
//
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Get_processor_name(name,&length);
printf("Hello MPI: processor %d of %d on %s\n", rank,size,name);
MPI_Finalize();
}
To compile the program mpi_hello (assuming you have created the sample program)
$ module load openmpi-3.0.1/gcc-9.2.0
$ mpicc -o mpi_hello mpi_hello.c
Scheduler submission script - submit_mpi_hello.sh."
#!/bin/bash
#SBATCH --job-name mpi_hello
#SBATCH --nodes 2
#SBATCH --ntasks-per-node 2
#SBATCH --partition short
#SBATCH --output mpi_hello.txt
module load openmpi-3.0.1/gcc-9.2.0
mpiexec ./mpi_hello
# run sleep for 20 sec. so we can test the 'squeue' command
srun sleep 20
Submit the program mpi_hello to the SLURM scheduler (assuming you have created the sample program and submit script)
$ sbatch submit_mpi_hello.sh
The “squeue” command should now show a running job.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
348 short mpi_hell will R 0:14 2 compute[127-128]
After this runs, listing the directory contents should now has a the C code file, program, submission script, and output file.
$ ls
mpi_hello mpi_hello.c mpi_hello_submit.sh mpi_hello.txt
The output file will show the nodes and cores that it ran on.
$ cat mpi_hello.txt
Hello MPI: processor 0 of 4 on compute127.cluster
Hello MPI: processor 1 of 4 on compute127.cluster
Hello MPI: processor 2 of 4 on compute128.cluster
Hello MPI: processor 3 of 4 on compute128.cluster
If a job does a lot of the same thing, like run the same calculations on different inputs, it is highly recommended to use a job array.
For more examples of SBATCH scripts, please refer to the SLURM Scheduler page.
In addition to the Free access tier, there is now a Priority access tier making it possible for researchers to reserve dedicated computer time for their funded research needs. Details are available in OIT’s description of the High Performance Computing (HPC) Clusters service, including a link to the HPC Priority Access request form where researchers can engage with OIT to assess their HPC requirements in order to include funding for Priority access in future research grant proposals.
After your request has been processed, you will be able to submit the job to the higher priority partitions. Jobs submitted to these partitions will preempt the jobs in regular tier. More details on the node specification can be found here. Maximum job runtime on these partition is 20 days. Send a request to help-rc@pdx.edu if you need to extended the runtime of your job beyond the maximum time limit.
priority_access - 130 compute nodes
priority_access_himem - 2 himem nodes
priority_acces_gpu - 10 GPU nodes
To submit a job to a high priority partition following SLURM parameter has to be passed, for regular compute nodes:
--partition priority_access
for himem nodes:
--partition priority_access_himem
or for gpu jobs:
--partition priority_access_gpu