Hardware

Computational Resources

HPC at CWRU has been replenished over time from various sources of funding. Most of the time, these multiple generations of servers would not affect the overall performance and result of the computational work conducted on the cluster. However, for some specific benchmarking may require only certain compute nodes are used to provide consistency. With that in mind, we are providing the overall resources that we have and how to use certain nodes that have similar features.

Pioneer is a RHEL8 based cluster aimed towards supporting university research projects. It currently has about 50% of the total computing resources and will be expanded over the next few months. The compute nodes are mostly Dell PowerEdge servers utilized for both high throughput and parallel jobs. A mixture of GPU nodes are available to accelerate the computation with up to 4 GPU cards available per node.
Rider is a RHEL7 based cluster that will be phased out by Spring 2024.
Markov is a special cluster aimed towards supporting university academic courses, especially the Data Science projects that support big data, machine learning, and AI applications. 23 compute nodes support the CPU-based computations, and 20 GPU nodes, provide resources for GPU-enhanced problem solving.
AISC (Artificial Intelligence Supercomputer) is an Nvidia DGX A100 cluster that is funded by an NSF MRI grant, and provides a powerful Artificial Intelligence/Machine Learning computational platform. This consists of 4 DGX A100 with 8 GPU cards each. Each GPU has 80GB of memory.

Nodes available

Pioneer: 129 (16 gpu nodes)
Rider: 106 (21 gpu nodes)
Markov: 43 (20 gpu nodes), for academic course support
AISC: 4 (4 gpu nodes)

HPC Resource View

This page gives the overview of the node partitions, features that we have, including giving out the number of CPUs, memory size, the quantity, and the compute node names. Both Rider (research) and Markov (course) clusters information are included.

HPC Servers

This page gives the detailed information on how to collect the classification and type for a specific node, including its current status on the cluster. Knowledge on type of nodes and their attributes helps you to request for appropriate resources for your jobs. If certain node types are heavily utilized, it is often much faster to get a less-on-demand node types and the sooner you will get your computational results as well.

Web Portal for Compute Resources: Ganglia

Page updated

Report abuse