Scientific Computing Environment

Scientific Computing - Introduction

The HPC cluster maintained at CWRU is of modest size in comparison with other HPC/ supercomputer resources and commercial providers. While our resource serves well for many research uses, it is not of adequate size to handle larger computational tasks, burstable high-performance computing (BHPC), that may require several hundred or several thousand simultaneous processors. So, the current on-premise HPC resource has been augmented by providing knowledge and educational training on resources that are available commercially or through supercomputing centers.

[U]Tech Research Computing (RC) team has partnered with regional and national federal or state funded resources such as XSEDE, OSC, PSC that provide HPC capacity for BHPC. In the event that we cannot facilitate the utilization of allocation credits in the supercomputer centers, we will offer assistance in setting up resources in AWS or POD based on business process and need.

Transitioning from a local infrastructure to the use of cloud services - particularly in the public cloud - is becoming commonplace in higher education. Thus far, however, this trend has not been fully tested for research computing. In research computing, there are unique needs and issues, including: long-term use and manipulation of data, sensitive data, restricted-use data, modeling, simulation, network speed and latency, packet loss, and more.

The tight coupling of Software to the specific underlying hardware for optimal performance has engendered portability issue. Though Virtual Machines (VMs) can encapsulate the computing environment to enhance application and workflow portability across varying infrastructure. The adoption of container technologies like singularity helps leveraging HPC interconnects, resource manager, file system, GPUs, accelerators etc in a secure manner without resource contention and can facilitate domain-specific-computing environment. It can ease the need for VMs allowing user to choose their own platform. It allows complex software that has strict platform requirements to be easily installed in the container image. Hence, the appetite for containers in scientific computing is burgeoning.

Identified Resources For Scientific Computing

On-Premise HPC Resource

The modest size [U]Tech HIgh Performance Computing is fulfilling most of the computational needs of researchers and students. HPC comprises of different types of nodes including the high memory SMP Nodes up to 1TB of memory and GPU nodes for accelerating deep learning, analytics, and engineering applications.

Regional and National Federal or State funded computing resources

OSC: The Ohio Supercomputer Center is a statewide resource that provides supercomputing services and computational science expertise to Ohio university researchers as well as Ohio industries.

Any faculty member or research scientist at an academic institution in Ohio is eligible for an academic account at OSC. These researchers/educators may request accounts for their students and collaborators. Commercial accounts are also available. More information about applying for both academic and commercial accounts at OSC can be found at https://www.osc.edu/supercomputing/support/account.

[U]Tech Research Computing can provide free consultation and contacts to request for the OSC access. A start-up allocation is typically provided at 50,000 SUs, with minimal information verification. Further allocation would require a usage proposal and would be granted based on previous successful allocation utilization in the form of publications and peer recognition.

NSF Computing Ecosystem: https://www.xsede.org/ecosystem/resources

The OSG Consortium: https://osg-htc.org/

National Research Platform: https://nationalresearchplatform.org/

JetStream2: https://jetstream-cloud.org/about/index.html

Science Gateways: https://sciencegateways.org/about

CYVERSE: https://www.cyverse.org/about

Internet2: https://internet2.edu/

XSEDE (eXtreme Science & Engineering Discovery Environment/ACCESS: XSEDE now ACCESS, integrates resources and services - supercomputers, collections of data, and new tools, makes them easier to use, and helps more people use them.

Submit an allocation request via the XSEDE portal. XSEDE resources would vary from time to time, and currently consists of the following clusters:

- San Diego Supercomputing Center: Gordon, Trestles
- Texas Advance Computing Center: Lonestar, Stampede2 (Xeon Phi)
- NICS Oak Ridge: Kraken
- Pittsburgh Supercomputer Center: Blacklight, Bridges
- Indiana University: Mason, Quarry
- Georgia Tech: Keeneland (Nvidia)

To apply for an account, you will need the following information:

- An estimate of the computing time needed in SUs, or an estimate of the amount of storage needed (in Terabytes, or TB).
- A short abstract of your computational project
- The Principal Investigator's CV

PSC: PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry and is a leading partner in XSEDE (Extreme Science and Engineering Discovery Environment), the National Science Foundation cyberinfrastructure program.

PSC provides university, government and industrial researchers with access to several of the most powerful systems for high-performance computing, communications and data storage available to scientists and engineers nationwide for unclassified research. PSC advances the state of the art in high-performance computing, communications and data analytics and offers a flexible environment for solving the largest and most challenging problems in computational science.

DOE National Lab HPC Centers and Systems

The National Renewable Energy Laboratory (NREL) https://www.nrel.gov/hpc/
Livermore Computing: HPC at LLNL, https://hpc.llnl.gov/
Los Alamos NL High Performance Computing https://www.lanl.gov/org/ddste/aldsc/hpc/index.php
Idaho NL HPC, https://hpc.inl.gov/SitePages/Home.aspx
Oak Ridge NL’s Leadership Computing Facility, https://www.olcf.ornl.gov/
Argonne NL’s Leadership Computing Facility, https://www.alcf.anl.gov/
Sandia NL HPC, https://hpc.sandia.gov/access/
Advanced Scientific Computing Research (ASCR): https://science.osti.gov/ascr
Energy Sciences Network (ESnet): https://science.osti.gov/ascr/Facilities/User-Facilities/ESnet

Commercial Computing Resources

Amazon Web Services (AWS) - Amazon Elastic Compute Cloud (EC2): It provides an alternative resource for running cluster jobs as Infrastructure-as-a-Service (IaaS).EC2 allows users to rent virtual computers on which to run their own computer applications. EC2 allows scalable deployment of applications by providing a Web service through which a user can boot an Amazon Machine Image to create a virtual machine, which Amazon calls an "instance", containing any software desired. A user can create, launch, and terminate server instances as needed, paying by the hour for active servers, hence the term "elastic". EC2 provides users with control over the geographical location of instances that allows for latency optimization and high levels of redundancy.

Cluster Compute, Cluster GPU, and High Memory Cluster instances have been specifically engineered to provide high-performance network capability and can be programmatically launched into clusters – allowing applications to get the low-latency network performance required for tightly coupled, node-to-node communication. For more information, visit [U]Tech Specific Guide on AWS Resources for Researchers.

Google Compute Engine: It delivers virtual machines running in Google's innovative data centers and worldwide fiber network. Compute Engine's tooling and workflow support enable scaling from single instances to global, load-balanced cloud computing.It is flexible and helps create an instance with a simple command as shown which assigns 4 VCPUs and 5GB of memory:

gcloud compute instances create my-vm --custom-cpu 4 --custom-memory 5 //4 VCPUs and 5GB

POD (Penguin On Demand): Penguin Computing on Demand (POD) provides you HPC resources for which you pay-as-you-go. You don't need to own a powerful cluster to run your jobs even at large scale. POD's compute environment was designed specifically for high-performance computing and features typical HPC components such as low-latency interconnects and GPUs. For optimum performance all jobs are managed by industry leading HPC schedulers that support the job submission semantics of the open source scheduler TORQUE. All jobs are executed directly on POD HPC servers without a virtualization layer in the middle which provides similar environment as CWRU High Performance Computing (HPC) Cluster. Visit POD website for details.

POD offers easy registration. It requires a valid credit card but it is not charged when selecting free usage tier by submitting the job in a FREE queue (#PBS -q FREE) that includes resources as Intel 2.9GHz Westmere 1248GB 24 Cores for 5 Minutes. For more pricing and queue information, visit POD Cloud Rates and Services.

Cloud Computing Tools

These tools make it easier to burst to cloud.

CfnCluster (Cloud Formation Cluster): It is a framework that deploys and maintains high performance computing clusters on Amazon Web Services (AWS).

Developed by AWS, CfnCluster facilitates both quick start proof of concepts (POCs) and production deployments. CfnCluster supports many different types of clustered applications and can easily be extended to support different frameworks. Download CfnCluster today to see how CfnCluster’s command line interface leverages AWS CloudFormation templates and other AWS cloud services.

StarCluster: StarCluster is an open source cluster-computing toolkit for Amazon’s Elastic Compute Cloud (EC2) released under the LGPL license.

StarCluster has been designed to automate and simplify the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud. StarCluster allows anyone to easily create a cluster computing environment in the cloud suited for distributed and parallel computing applications and systems.

StarCluster provides AMI (Amazon Machine Image) based on Ubuntu 11.10 OS and several bundled applications such as OpenMPI, Atlas/NumPy/SciPy, Hadoop, Open Grid Scheduler, Condor. (AMI) provides the information required to launch an instance, which is a virtual server in the cloud.

CloudyCluster: It eases the burden of creating an HPC or Big Data Infrastructure in the Cloud. It gives you the power to spin up an Advanced Computing Infrastructure, complete with many open computational applications pre-installed. You can even pause, resume, compute some more and delete when you are finished. With CCQ (CloudyCluster Queue) you can submit a job and have the AWS infrastructure created for you on demand, for only as long as your jobs need it.

Bright Computing: Bright Computing provides comprehensive software solutions for provisioning and managing HPC clusters, Hadoop clusters, and OpenStack private clouds. Our software and services are sold around the world, both directly and through our extensive network of reseller partners.

Bright Cluster Manager provides a unified enterprise-grade solution for provisioning, scheduling, monitoring and management of HPC and Big Data systems in your data center and in the cloud. Our dynamic cloud provisioning optimizes cloud utilization by automatically creating servers when they’re needed and releasing them when they’re not. Bright OpenStack provides a complete cloud solution that is easy to deploy and manage.

VMWare vSphere: It is the industry-leading virtualization platform that empowers users to virtualize any application with confidence, redefines availability and simplifies the virtual data center. The result is a highly available, resilient, on-demand infrastructure that is the ideal foundation of any cloud environment. This can drive down data center cost, increase system and application uptime, and drastically simplify the way IT runs the data center. vSphere is purpose-built for the next generation of applications and serves as the core foundational building block for the Software-Defined Data Center

vSphere accelerates the shift to cloud computing for existing data centers and also underpins compatible public cloud offerings, forming the foundation for the industry's only hybrid cloud model.

Container Solutions

The adoption of container technologies like singularity are capable of leveraging HPC interconnects, resource manager, file system, GPUs, accelerators etc in a secure manner without resource contention and can facilitate domain-specific-computing environment and can ease the need for VMs allowing user to choose their own platform. It allows complex software that has strict platform requirements to be easily installed in the container image.

Singularity: Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data. This means that you don’t have to ask your cluster admin to install anything for you - you can put it in a Singularity container and run. Did you already invest in Docker? The Singularity software can import your Docker images without having Docker installed or being a superuser. Need to share your code? Put it in a Singularity container and your collaborator won’t have to go through the pain of installing missing dependencies. Do you need to run a different operating system entirely? You can “swap out” the operating system on your host for a different one within a Singularity container. As the user, you are in control of the extent to which your container interacts with its host.

It’s pretty simple. You can make and customize containers locally, and then run them on your shared resource. As of version 2.3, you can even pull Docker image layers into a new Singularity image without sudo permissions. Singularity also allows you to leverage the resources of whatever host you are on. This includes HPC interconnects, resource managers, file systems, GPUs and/or accelerators, etc. Singularity does this by enabling several key facets:

- Encapsulation of the environment
- Containers are image based
- No user contextual changes or root escalation allowed
- No root owned daemon processes

Shifter: Shifter enables container images for HPC. In a nutshell, Shifter allows an HPC system to efficiently and safely allow end-users to run a docker image. Shifter consists of a few moving parts:

1) a utility that typically runs on the compute node that creates the runtime environment for the application

2) an image gateway service that pulls images from a registry and repacks it in a format suitable for the HPC system (typically squashfs)

3) and example scripts/plugins to integrate Shifter with various batch scheduler systems.

Virtual Machines Vs. Containers

Scientific Computing Needs @ CWRU

The researchers at CWRU have occasionally requested for external resources though most of their scientific computing needs have been met by on-premise HPC resource that is not only readily available and cheaper with free guest account but also provides short or no queue time compared to external resources.

Researchers are mostly using OSC and XSEDE resources, and few commercial AWS ECC instances for their burstable high-performance computing needs. The portability issues on few specialized complex scientific software packages with numerous dependencies have also generated interest on container solutions like Singularity that has been made available in [UTech HPC. Occasionally, HPC users have also requested for VMs integrated with HPC for web server and database services.

Here are few of the expressed needs from CWRU researchers:

In the post-project (Research computing resources and services for Data Science) meeting with SDLE team, Roger French expressed his need to be offered other Linux OS environments than RHEL that HPC offers.

Ming Chun Huang’s team focussed on mobile Health (mHealth) is using Android-based smart phone, smart watch, IoT devices etc as their data sources and are planning to real-time stream those device data to the program (using Torch) running in the HPC cluster.

Mark Turner, co-director of Red Hen Lab, requested the container solution in HPC to allow them to install statistical and deep learning packages for multimodal communication. The audio pipeline currently running on [U]Tech’s High Performance Computing (HPC) cluster is also being used for Google Summer of Code grants. In one of his emails, Mark Turner also request:

"I need MySQL because I am testing and writing a python wrapper for others' code that requires MySQL, so I cannot be sure about the details. But if I understand the documentation correctly, MySQL is used to save acoustic fingerprinting of audio files, and 1885 MB wav file amounts to 355 MB fingerprinting database. And the Red Hen Lab might use it to process hundreds of GB of news audios everyday".

Scientific Computing Offering @ CWRU

The hybrid on-premise [U]Tech HPC cluster consists of SMP and GPU nodes on top of other general purpose computing resources to fulfill almost all scientific computing need of researchers. There is a possibility that a small GPU cluster and a dedicated storage will be created to process and manage the CCMSB’s Cryo-EM (Electron Microscopy) big data output. In the field of data analysis, CWRU RCCI also offers general purpose Hadoop Cluster as well as a couple of specialized Hadoop Cluster for Roger French’s SDLE (Solar Durability and Lifetime Extension) Lab and for for Will Bush for his Aneris project.

Besides on-premise HPC resources, [U]Tech Research Computing provides the following Scientific Computing Services:

The Campus Champions of [U]Tech RCCI team to assist local researchers to quickly get start-up allocations of computing time on Regional and National Federal or State funded computing resources on top of providing them information about those resources. Glenn Starkman and Phoebe Stewart groups are recent users of XSEDE (PSC -- Bridges)

[U]Tech RCCI staffs can help set up cloud computing service through AWS, StarCluster, CnfCluster, POD, and Bright Computing. If someone is interested in trying this out, we would be happy to help them out. Here is the URL for information on using HPC at AWS: https://aws.amazon.com/hpc/

Rong Xu and Satya Sahoo are using AWS for biomedical big data, and Roger French is using it mostly for storage. Rakesh Niraj, Associate Professor of Marketing, Weatherhead is using an AWS Deep Learning AMI with a connected GPU for analytics. Mingguo Hong, Associate Professor , EECS, has set up some infrastructure on AWS in order to collect building sensor data as part of a collaboration with Pacific Northwest National Labs (PNNL). There are researchers at the Institute of Computation Biology that have an AWS account and are using AWS for research.

HPCVM server is designated for established VMs through VMWare vSphere. Chris Fietkiewicz and his student are testing OSC Open OnDemand. [U]Tech is enhancing VM Infrastructure to effectively support researchers via Project 3514 on “VM Infrastructure for Research” which can provide service also for HPC users with its integration with HPC and Science DMZ.

Singularity has been installed in HPC with a Guide to Singularity which is allowing researchers to use a complex and powerful Deep Learning package called Tesnsorflow having GPU capabilities. It provides both python2 and python3 modules. The package for Earthquake simulation, OpenSees, has also been installed as a special request from HPC user. Singularity is being used by bunch of users from SDLE Lab, Red Hen Lab, Center for Computational Imaging and Personalized Diagnostics, Biomedical Imaging Lab, Religious Studies, Hinczewski Theoretical Biophysics Lab, mechatronics and Robot design, Radiology and MetroHealth Medical Center, and more.

Conclusion

The performance may not be optimal in the cloud especially when high performance network capability is required for node-node communication and for cluster GPU instances. Also, many vendors based Independent Software Vendor (ISV) codes are not licensed or developed in a way that is cloud enabled. One important feature of cloud computing through AWS is the ability to auto scale resources (nodes/processors, memory etc) as needed to meet the varied demands. The concept of hybrid computing allows bursting jobs from an on-premise HPC cluster to a cloud provider based on resource needs and duration.If data must be regularly moved out of cloud, an on-premise resource may be the best solution as the cloud service charge by GB of data.

Amazon-supplied AMIs for commercial OSs (Windows, RHEL, SLES) imposes an additional cost for the OS license. "Buy your own" licensing is possible, but requires building an AMI from scratch, which is technically very challenging compared to "normal" VM environments due to the complete lack of an interactive console (i.e., the system must boot to a fully-functional OS on the very first boot-no bootable installers or 'finish configuring on first boot' scenarios)

Page updated

Report abuse