Sustainable HPC State of the Practice Workshop 2024
Sustainably supporting science through committed community action
Sustainably supporting science through committed community action
Sustainable High Performance Computing State of the Practice Workshop (Sustainable HPC SOP Workshop) in conjunction with Cluster 2024.
September 24, 2024
Kobe, Japan
KEYNOTE: Frontiers in Sustainability
Jim Rogers from Oak Ridge National Laboratory (ORNL) will be the keynote speaker. Jim Rogers is the Computing and Facilities Director for the National Center for Computational Science at the Oak Ridge National Laboratory. He has thirty years of experience in high-performance computing (HPC) and has provided strategic planning, technology insertion, and integration support for multiple computing centers, including the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (OLCF), the U.S. Air Force, NOAA's National Climate Computing Research Center (NCRC), U.S. Army Corps of Engineers Engineer Research and Development Center (ERDC), the Aeronautical Systems Center, the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, NASA Ames Research Center, the Defense Intelligence Agency, and the Alabama Supercomputer Center. He has primary responsibility for the strategy, acquisition, delivery, integration, and transition to production for high performance computing, storage, networking, and analysis systems as well as the physical facilities that house these systems. He manages recurring operational activities for both the NCCS systems and the supporting facility/infrastructure . These activities extend across multiple Federal customers.
AGENDA:
9:30:00- 10:30:00 Opening Statement and Keynote
10:30:00- 10:45:00 Break
10:45:00- 12:15:00 Paper Presentations
Microgrid Integration with High Performance Computing Systems for Microreactor Operation
How-to Guide for Transitioning from Air to Liquid-cooled High Performance Computing Systems*
12:15:00 13:15:00 Lunch
13:15:00 14:45:00 Paper Presentations
Evolving Large Scale HPC Monitoring & Analysis to Track Modern Dynamic Environments
16 Years of SPEC Power: An Analysis of x86 Energy Efficiency Trends*
14:45:00 15:00:00 Break
15:00:00 16:30:00 Paper Presentations
Power-Efficiency Variation on A64FX Supercomputers and its Application to System Operation
PowerSched - managing power consumption in overprovisioned systems
Advanced Visualization of Power, Temperature, and Energy Metrics in HPE Cray EX Systems*
16:30:00 16:45:00 Break
16:45:00 18:15:00 Paper Presentation, Panel and Closure
Optimizing Idle Power of HPC Systems: Practical Insights and Methods
Discussion
Closing Remarks
*All paper presentations are 30 minutes except for the 3 marked with an asterisk. These are 20 minute presentations.
=====================================================
ABSTRACT:
The demand for ever more-capable high performance computing (HPC) is driving significant changes across the design and manufacturing space, as manufacturers turn to heterogeneous systems that integrate increasing die-count, from multi-core CPUs to accelerators, traditional memory architectures to high-bandwidth memory, modern interconnects, and massive storage servers. These designs require substantially higher energy, with commensurate methods for managing heat, in increasingly dense packages.
To effectively manage and operate these systems, HPC and Data Center (DC) practitioners must balance and coordinate constraints across many domains: environment, utilities, data center, HPC system hardware and software, and end-user applications.
While the capital costs for the acquisition of these systems have long been recognized, the energy needed to power and cool these systems has similarly become a first-order constraint. Now, increasingly, other constraints related to the overall sustainability of these systems are being examined, among them, the associated production of greenhouse gas (GHG) emissions and water consumption.
While the community has made significant improvements to operational efficiency of data centers, notably through direct liquid cooling of specific system components, there is a broader scope of environmental impacts, across the life cycle of our facilities and systems, that must be considered. This includes the full life cycle costs for producing these HPC systems during their entire lifetime, from system design, manufacturing, daily operations, reusability, eventual decommissioning, and recyclability. Only by analysis and optimization of these elements can we understand and manage the full life cycle cost and carbon footprint of these systems.
This workshop seeks to leverage the experiences of early adopters and innovators in operational practices and technologies that can improve energy and power management capabilities, reduce GHG emissions, and provide careful stewardship of natural resources, like water. This workshop will explore these operational and technological innovations that span the full stack of HPC computational systems as well as building infrastructure.
As part of this peer-reviewed workshop, we solicit papers that capture best practices, policies, procedures, and technologies. The vision is to help the broader community benefit from these experiences. The papers are intended to identify use cases, lessons learned, and best practices in design, commissioning, and operations. The solicited papers will be generally descriptive with concrete, reproducible, and empirical data gathered through surveys, case studies, and research for practice.