EE HPC SOP 2021 Technical Program

KEYNOTE:

Energy-Efficient HPC in the Exascale era, What’s Next? Woong Shin

Keynote Presentation and Recording

Abstract: Ever since beginning the pursuit of achieving exascale HPC, the projected power and energy consumption was considered one of the major challenges. Due to the physical limits of silicon technology, exascale would require an infeasible amount of power and energy to invest and operate, and we had work to do. Fast forward to 2021, we are about to see the first generations of exascale HPC systems at the verge of being deployed flipping their switches up and transitioning into operation. Starting from where we were before, this will be a huge achievement from a collective effort throughout the whole HPC community. But what’s next? Now that the challenges of taming the design time considerations are done, are we done? What are the challenges we have left as a legacy as we are going into operating these systems?

As an opening talk into this state of practice workshop, we will explore such questions to see what HPC energy-efficiency would look like at the verge of going into a new era. In particular, sharing the multi-year experience in deploying, operating, and analyzing data from the operational data analytics system that supports cooling operations of Summit, the 200PF pre-exascale supercomputer at the Oak Ridge Leadership Facility (OLCF), we aim to peek into the what HPC energy-efficiency would require in the exascale era and discuss trends, challenges and opportunities.

ACCEPTED PAPERS:

Evaluation of SPEC CPU and SPEC OMP on A64FX, Yuetsu Kodama, Masaaki Kondo and Mitsuhisa Sato

Paper, Workshop Presentation and Recording

Abstract—We evaluated the A64FX processor used in the supercomputer Fugaku using the SPEC CPU and SPEC OMP benchmark suites. As a result, we found the performance of the A64FX processor, which had 48 cores, was lower than that of the Xeon with dual sockets of 24 cores each in SPEC CPU int and fp. In SPEC OMP, due to the effect of the Xeon’s Hyperthread, the A64FX performance was lower than the performance of the Xeon with single socket of 28 cores. But in several benchmarks in SPEC CPU fp and SPEC OMP, the A64FX performance was higher due to its high memory bandwidth. In addition, by comparing the performance and power using the power control mechanism of the A64FX, it was confirmed that power can be reduced without affecting the performance when not using all cores.

Energy Efficiency Aspects of the AMD Zen 2 Architecture, Robert Schoene, Thomas Ilsche, Mario Bielert, Markus Velten, Markus Schmidl and Daniel Hackenberg

Paper, Workshop Presentation

Abstract—In High Performance Computing, systems are evaluated based on their computational throughput. However, performance in contemporary server processors is primarily limited by power and thermal constraints. Ensuring operation within a given power envelope requires a wide range of sophisticated control mechanisms. While some of these are handled transparently by hardware control loops, others are controlled by the operating system. A lack of publicly disclosed implementation details further complicates this topic. However, understanding these mechanisms is a prerequisite for any effort to exploit the full computing capability and to minimize the energy consumption of today’s server systems. This paper highlights the various energy efficiency aspects of the AMD Zen 2 microarchitecture to facilitate system understanding and optimization. Key findings include qualitative and quantitative descriptions regarding core frequency transition delays, workload-based frequency limitations, effects of I/O die P-states on memory performance as well as discussion on the built-in power monitoring capabilities and its limitations. Moreover, we present specifics and caveats of idle states, wakeup times as well as the impact of idling and inactive hardware threads and cores on the performance of active resources such as other cores.

Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architectures, Julita Corbalan, Oriol Vidal, Lluis Alonso and Jordi Aneas

Paper, Workshop Presentation and Recording

Abstract—EAR is an energy management framework which offers three main services: energy accounting, energy control and energy optimization. The latter is done through the EAR runtime library (EARL). EARL is a dynamic, transparent, and lightweight runtime library that provides energy optimisation and control. It implements energy optimization policies that selects the optimal CPU frequency based on runtime application characteristics and policy settings. Given EARL defines a policy API and a plugin mechanism, different policies can be easily evaluated.

In this paper we propose and evaluate the utilization of explicit Uncore Frequency Scaling (explicit UFS) in Intel architectures to increase the energy savings opportunities in the cases where the hardware cannot select the optimal IMC frequency. We extended the min energy to solution policy to select CPU and IMC frequencies and we executed and evaluated it with some kernels and 6 real applications. Results showed an average energy saving of 9% with an average time penalty of 3%. On some use cases, the impact of explicit UFS compared with HW UFS was up to 8% of extra energy savings.

FIRESTARTER 2 - Dynamic Code Generation for Processor Stress Tests, Robert Schoene, Markus Schmidl, Mario Bielert and Daniel Hackenberg

Paper, Workshop Presentation

Abstract—Processor stress tests target to maximize processor power consumption by executing highly demanding workloads. They are typically used to test the cooling and electrical infrastructure of compute nodes or larger systems in labs or data centers. While multiple of these tools already exists, they have to be re-evaluated and updated regularly to match the developments in computer architecture. This paper presents the first major update of FIRESTARTER, an Open Source tool specifically designed to create near-peak power consumption. The main new features concern the online generation of workloads and automatic self-tuning for specific hardware configurations. We further apply these new features on an AMD Rome system and demonstrate the optimization process. Our analysis shows how accesses to the different levels of the memory hierarchy contribute to the overall power consumption. Finally, we demonstrate how the auto-tuning algorithm can cope with different processor configurations and how these influence the effectiveness of the created workload.

Cooling the Data Center: Design of a Mechanical Controls Owner Project Requirements (OPR) Template, Stefan Robila, David Grant, Chris DePrater, Vali Sorell, Terry Rodgers, Dave Martinez and Shlomo Novotny

Paper and Workshop Presentation

Abstract— As the power demands of supercomputers continue to grow, so do the demands of the mechanical cooling systems that support the infrastructure and building in which the supercomputers reside. Planning for the cooling systems and their mechanical controls are an intrinsic part of any new supercomputer installation. To support the design and commissioning of the mechanical control systems, the Energy Efficient High Performance Computing Working Group (EE HPC WG) Cooling Controls is developing a template for an OPR (Owner Project Requirements) document. The design of the template, while pursued by a small team, leveraged the expertise of the broad membership of the EE HPC WG through surveys and feedback sessions. As a result, the OPR template includes not only a suggested structure, and a checklist of topics that a site might consider including in the document, but also many real-world examples of how the topics were addressed in previous projects. Finally, the Mechanical Controls OPR template is being developed in parallel with other templates focused on other aspects of a supercomputer installation. These templates are intended to improve the efficiency and comprehensiveness of the programming (pre-design) phase of project execution and provide the engineering and design team with better clarity of the facility infrastructure capabilities, expandability, and performance requirements. The intended result is improved construction documents (Basis-of-Design, drawings and specifications) that support the project goals and objectives including reliability, resiliency, and energy efficiency expectations of the HPC facility.

A Conceptual Framework for HPC Operational Data Analytics, Alessio Netti, Woong Shin, Michael Ott, Torsten Wilde and Natalie Bates

Paper and Workshop Presentation

Abstract—This paper provides a broad framework for understanding trends in Operational Data Analytics (ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a four-pillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leading- edge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.