Quickstart

I am a faculty member seeking to utilize the HPC in my research

HPC Resources

Any CWRU faculty can request an HPC account as the Principal Investigator (PI). We offer a free tier of services for evaluation of how well our infrastructure supports your research, with a paid tier of services available to expand your resource allocation in blocks called "membership units".

Requesting a Free Tier Account

Faculty may use the Amara portal (any browser with Single Sign On + Duo authentication) to request a free tier account that provides access to 48 CPUs/1 GPU and a maximum job walltime of 36 hours. Once an account is established, the Amara portal also provides features to manage the list of lab members that may utilize the resources associated with the HPC account. Faculty may also designate one or more senior personnel as managers with permission to add or remove members from their HPC account.

Increased Resources and Membership

If you would like to expand your resources beyond the free tier, you can purchase additional membership units through the iLab interface using a CWRU speedtype. Membership units apply for 1 year, and each unit provides access to 128 CPUs/6 GPUs. Purchasing one or more units also increases the job walltime to 13.5 days and expands the group quota on the home file system.

Accessing the Cluster

After provisioning, faculty access the cluster using the same methods as the lab members. You can review these methods in the researcher section of the quickstart.

Storage Services

We offer services for storage that are integrated with the HPC infrastructure, and for storage that can be used independently and mounted on machines in your lab. You can find more information about the different services in the Storage section.

I am a student using the HPC as part of an academic course

Synopsis

You will connect to the academic cluster Markov to run computations as part of your coursework. Below we summarize different methods for establishing a connection to the cluster, and for running computations. The methods you need to use will depend on the course objectives and the materials that the instructor has prepared.

Connecting to Markov

Before you can run computations, you will need to establish a connection, or "session" with the cluster that verifies your identity.

Web Based (recommended)

This connection is provided by a project called Open OnDemand and will let you launch terminals, applications like Jupyter or RStudio and even full desktop environments. You will connect to https://ondemand-markov.case.edu and authenticate with your network ID and password.

The site does not require the use of VPN and should be reachable from campus wireless and from off campus.
The site is integrated with DUO and you may need to confirm the connection through one of the available DUO methods.

SSH

This connection is text based without additional setup. You can connect directly to a login node using your operating systems ssh client. Linux and Mac OS both include ssh clients. For Windows, we generally recommend the use of the PuTTY ssh client[1], though newer versions of Windows do have a native client available[2].

The server name you will use is markov.case.edu, e.g.

ssh markov.case.edu

This method requires the use of either a hardwired campus connection or the VPN
Graphical applications will not work unless you establish X forwarding. Additional instructions for X forwarding are available here.

Running Computations

Depending on the course you are taking, you may have a significantly customized environment for running your computations, such as custom applications in OnDemand. In this case the course materials will include the directions for how to launch the applications. If your course is launching computations directly on the cluster using the command line tools, the following commands will be useful.

Terminology

Job - A computation you want to run on the cluster

Queue - The list of jobs running and waiting to run on the cluster.

Scheduler - The software the manages jobs and the queue.

Compute node - A computer designated to running computational tasks

Login Node - A node designated for managing user connections and submitting jobs

Where to Run Commands

If you have connected through the recommended OnDemand method above, you will see an option at the top "Clusters", under which you would click the option "_markov shell". This will start a terminal on one of the login nodes, e.g. hpc-login, hpc-login2 or markov01.

If you connected directly through ssh, your terminal will already have a shell running on one of the login nodes, e.g. hpc-login, hpc-login2 or markov01.

DO NOT run computational tasks directly on the login nodes. These nodes are where you will run the job management commands below to submit new jobs to the cluster.

Start a Batch Job

Batch jobs run on the cluster without needing user input. They are the most efficient way to run computations, and part of this is because they are well defined. All steps of the computation are included in a script which is then submitted to the scheduler using the sbatch command.

Example job script named "myscript.slurm" that just sleeps for 30 seconds:

#!/bin/bash

#SBATCH -c 2 # 2 CPUs

#SBATCH -A <PI> -p markov_cpu # PI account and partition

#SBATCH --mem=8G # 8 GB of RAM

#SBATCH --time=1 # Runtime of 1 minutes

sleep 30

For GPU partion, use -p markov_gpu. Check Markov resource view for details.

Submit command:

[stm@class-login ~]$ sbatch -p markov_cpu myscript.slurm

Submitted batch job 19169393

Start an Interactive Job

Interactive jobs are good for debugging. They establish a terminal running on the computational resources so that you can test commands and scripts interactively, much the same as if you were testing on your own local computer.

[stm@class-login ~]$ salloc -A sxg125_csds438 -p markov_cpu --time=30 -c 2 --mem=8G srun --pty /bin/bash

salloc: Granted job allocation 19169394

salloc: Nodes classt01 are ready for job

[stm@classt01 ~]$ exit

exit

salloc: Relinquishing job allocation 19169394

salloc: Job allocation 19169394 has been revoked.

[stm@class-login ~]$

Note how the prompt changed from stm@class-login to stm@classt01. This reflects that your commands are running on the compute node rather than the login node. Typing exit or Ctrl+D will end the interactive job and the prompt will return to the login node.

View My Jobs

You list you jobs in the queue with the squeue command with the --me option:

[stm@class-login ~]$ squeue --me

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

19169392 markov_cpu sys-dash stm R 0:17 1 classct003

Cancel My Job

You will use the scancel command with the job id as an argument:

[stm@class-login ~]$ scancel 19169392

Using Software Modules

Please review Module System Basics and Module System In-depth.

I am a researcher using the HPC as part of a lab or research group

Synopsis

You will connect to the research cluster Pioneer to run computations. Below we summarize different methods for establishing a connection to the cluster, and for running computations. The methods you need to use will depend on the research methods and software requirements. It is usually best to collaborate with another lab member who is also an HPC user as most labs have established procedures for how best to complete work utilizing the HPC.

Connecting to Pioneer

Before you can run computations, you will need to establish a connection, or "session" with the cluster that verifies your identity.

Web Based (recommended)

This connection is provided by a project called Open OnDemand and will let you launch terminals, applications like Jupyter or RStudio and even full desktop environments. You will connect to https://ondemand-pioneer.case.edu and authenticate with your network ID and password.

The site does not require the use of VPN and should be reachable from campus wireless and from off campus.
The site is integrated with DUO and you may need to confirm the connection through one of the available DUO methods.

SSH

This connection is text based without additional setup. You can connect directly to a login node using your operating systems ssh client. Linux and Mac OS both include ssh clients. For Windows, we generally recommend the use of the PuTTY ssh client[1], though newer versions of Windows do have a native client available[2].

The server name you will use is pioneer.case.edu, e.g.

ssh pioneer.case.edu

This method requires the use of either a hardwired campus connection or the VPN
Graphical applications will not work unless you establish X forwarding. Additional instructions for X forwarding are available here.

Running Computations

Depending on your lab, you may have a significantly customized environment for running your computations, such as custom applications in OnDemand. In this case the lab will usually provide directions for how to launch the applications. If your lab is launching computations directly on the cluster using the command line tools, the following commands will be useful.

Terminology

Job - A computation you want to run on the cluster

Queue - The list of jobs running and waiting to run on the cluster.

Scheduler - The software the manages jobs and the queue.

Compute node - A computer designated to running computational tasks

Login Node - A node designated for managing user connections and submitting jobs

Where to Run Commands

If you have connected through the recommended OnDemand method above, you will see an option at the top "Clusters", under which you would click the option "_Pioneer shell". This will start a terminal on one of the login nodes, hpc7 or hpc8.

If you connected directly through ssh, your terminal will already have a shell running on one of the login nodes, hpc7 or hpc8.

DO NOT run computational tasks directly on the login nodes. These nodes are where you will run the job management commands below to submit new jobs to the cluster.

View My Jobs

You list you jobs in the queue with the squeue command with the --me option:

[stm@hpc7 ~]$ squeue --me

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

19169392 classc sys-dash stm R 0:17 1 compt300

Cancel My Job

You will use the scancel command with the job id as an argument:

[stm@hpc7 ~]$ scancel 19169392

Start a Batch Job

Batch jobs run on the cluster without needing user input. They are the most efficient way to run computations, and part of this is because they are well defined. All steps of the computation are included in a script which is then submitted to the scheduler using the sbatch command.

Example job script named "myscript.slurm" that just sleeps for 30 seconds:

#!/bin/bash

#SBATCH -c 2 # 2 CPUs

#SBATCH --mem=8G # 8 GB of RAM

#SBATCH --time=1 # Runtime of 1 minutes

sleep 30

Submit command:

[stm@hpc7 ~]$ sbatch myscript.slurm

Submitted batch job 19169393

Start an Interactive Job

Interactive jobs are good for debugging. They establish a terminal running on the computational resources so that you can test commands and scripts interactively, much the same as if you were testing on your own local computer.

[stm@hpc7 ~]$ salloc --time=30 -c 2 --mem=8G srun --pty /bin/bash

salloc: Granted job allocation 19169394

salloc: Nodes compt327 are ready for job

[stm@compt327 ~]$ exit

exit

salloc: Relinquishing job allocation 19169394

salloc: Job allocation 19169394 has been revoked.

[stm@hpc7 ~]$

Note how the prompt changed from stm@hpc7 to stm@compt327. This reflects that your commands are running on the compute node rather than the login node. Typing exit or Ctrl+D will end the interactive job and the prompt will return to the login node.