The EE HPC WG Workshop 2021 took place on Monday, December 6th and Tuesday, December 7th.
Copies of the presentation material and recordings of the event are captured for the entire workshop below.
==================================
Dec 06, 7AM-9AM PST (UTC -8)
EE HPC WG Overview and Update: An introduction to the EE HPC WG and some current Team Updates. Natalie Bates (EE HPC WG)
Recording of Sid Jana and Natalie Bates
Electrical System Provisioning and Powering HPC Systems: This panel will discuss the challenges of electrical system provisioning from a facilities and HPC system perspective. The panel will also explore other electrical system topics that are relevant to current and future HPC data centers. Moderator, Joe Prisco (IBM) and Panelists, David Mohr (NVIDIA), Grant Stewart (LANL), Herbert Huber (LRZ), Ethan Thomason (Bureau Veritas), Christian Blug (Siemens), Brandon Hong (LLNL)
Sustainability of HPC Computer Halls: Power consumption and cooling of computer halls has been discussed for a long time and is still the most important factor of the overall sustainability that tries to describe the total impact of a computer hall on the environment and society. Moderator, Gert Svensson (KTH, PDC) and Speakers Cate Berard (DOE), Esa Heiskanen (CSC), and John Elliott (LBNL)
Sustainability Approaches in the Operation of High Performance Computing Cooling Plants, John Elliott
Recording of John Elliott
Managing Climate Change Impacts of Data Centers and the Cloud, Cate Berard
Recording of Cate Berard
Sustainability of the Pre-Exascale System LUMI, Esa Heiskanen
Recording of Esa Heiskanen
Dec 07, 7AM-9AM PST (UTC -8)
Liquid Cooling: Direct-liquid cooing has become the de-facto standard for cooling large-scale HPC systems as it enables higher power density as well as better energy efficiency. The amount of heat captured directly in water and the water temperatures continue to increase in the latest deployments, helping further with energy efficient cooling. This session will discuss current trends, activities, and lessons learned in direct-liquid cooling. Moderator, Michael Ott (LRZ) and Speakers Jim Rogers (ORNL), David Smith (Sandia NL) and Daniele Cesarini (CINECA)
Bologna TECHNOPOLO CINECA INFN, Daniele Cesarini
Recording of Daniele Cesarini
Sandia National Laboratories HPC Data Center, David Smith
Recording of David Smith
Cooling at Exascale - A New Frontier, Jim Rogers
Recording of Jim Rogers
System-wide Power Monitoring Challenge: Monitoring the power consumption of large HPC systems accurately presents various challenges that can differ from installation to installation, but the resulting information can also be used in diverse ways to benefit the operation. The speakers of this session will offer their unique perspectives on these challenges and opportunities. Moderator, Thomas Ilsche (TU Dresden) and Speakers Woong Shin (ORNL), Joe Prisco (IBM) and Yusuke Doi (Preferred Networks).
Power Monitoring and Power Quality Metering, Joe Prisco
Recording of Joe Prisco
Measurement and Efficiency: MN-3 Case in Green500, Yusuke Doi
Recording of Yusuke Doi
Power Monitoring (and Data Analytics) @Oak Ridge Leadership Computing Facility, Woong Shin
Recording of Woong Shin
==================================
Joe Prisco is a Senior Technical Staff Member, IBM Systems. Joe is Chair of the IBM Development Power Council and technical owner of the physical planning information used to layout, design, and build data centers for IT equipment. Joe is the power profile chief test engineer responsible for compliance to worldwide energy laws, standards, and programs like ENERGY STAR. Joe is a rack power distribution architect and has an extensive background in electrical power. Joe is active on many committees, IEC SC77A/WG1 (low frequency EMC powerline emissions), NFPA 70 electrical code making panel 12, ASHRAE TC 9.9 and SSPC 90.4.
David Mohr is a Distinguished Engineer with NVIDIA Corporation where he leads Datacenter Power Architecture Advancements. He provides insights on power distribution, monitoring, protection, conversion, energy storage and compliance. Prior to joining NVIDIA, David held several leadership positions, some notable ones being Director of Engineering at Hewlett Packard Enterprise and Principal Engineer at AWS where he focused on datacenter power products.
Ethan Thomason has over thirty-five years of experience in the operation, design and commissioning of critical facilities. Ethan is a founding partner of Bureau Veritas Primary Integration and is responsible for the Technical Leadership of the company.
His background includes 9 years in the U.S. Navy’s Nuclear Power Program with service as a prototype staff instructor and submarine service. He has an additional 25 years of experience in the power protection industry including positions as a Controls Engineer at KW Controls/Piller, Inc and as a Sr. Commissioning/Testing Engineer at EYP-MCF. Ethan has worked at both new construction and active mission critical sites and has been responsible for installation, maintenance, and testing of the full range of data center electrical and mechanical equipment. Ethan has also worked on electrical issues in the Fab and other technical manufacturing environments. He also trains Operating staff and completes reliability and forensic analysis of Mission Critical Facilities.
Dr. Christian Blug is the Department Head Power Quality & Earthing Studies (PQE) and a Senior Key Expert System integration of dispersed generation at Siemens AG. His areas of expertise are: Power Quality (Power Quality Analytics, Disturbance analysis), Data Analytics and Artificial intelligence, Protection coordination, Integration of dispersed generation into transmission and distribution grids, Optimization (Optimal voltage control strategies in distribution grids, Optimal power flow control in transmission grids) and Strategic planning of distribution networks. His career at Siemens spans twenty years and has shown growth in technical and managerial leadership in this specific field of focus. He graduated from the Institute of Electrical Engineering, Saarland University, Saarbruecken for both his undergraduate and doctoral studies.
Dr. Herbert Huber is the Head of the High Performance Systems Division at Leibniz Supercomputing Centre in Garching, Germany. He obtained a PhD in physics from the Ludwig-Maximilians-University of Munich (LMU) in 1998. He joined BADW-LRZ in the year 1997 and is actually leading the “High-Performance Systems” department of BADW-LRZ. The focus of his work and his research interests are methods to improve the power and cooling efficiency of HPC systems and their operating infrastructure. Due to his commitment, BAdW-LRZ has been procuring and operating direct-hot-water cooled high-performance and ultra-high-performance computers since 2010 and is therefore one of the pioneers of warm direct liquid cooling, which enables year-round operation of the systems with compressor-less cooling (free cooling).
Grant Stewart is a registered professional civil engineer (BSCE & MSCE, 35 years of experience) and currently serves as Utilities Project Director for Los Alamos National Laboratory. He is a long-time friend of LANL’s Advanced Simulations & Computing Program, working at the interface between utility-scale power and growing HPC facilities. For LANL, he develops new sources of power supply and capital investment projects in utilities to serve the growing mission. He is also active in the EEHPC WG having served on several teams; his present focus is developing owner’s requirements guidance for electrical systems development and commissioning HPC facilities.
Gert Svensson is the Deputy Director of the PDC Center for High Performance Computing at the KTH Royal Institute of Technology in Sweden. Gert has worked with high-performance computing at PDC since 1990. He has initiated and lead various European research projects in different areas of HPC. Gert has also been involved with and directed numerous HPC procurements. He is also responsible for the infrastructure at PDC and is especially interested in energy efficiency and heat re-use for supercomputing systems. Since 2017 Gert has been an active member of the Energy Efficient HPC Working Group Team on Energy Efficient Procurement Considerations. Gert obtained an M.Sc. in Physics from KTH in 1981. After a period in the telecommunication industry, he returned to the Computer System laboratory at KTH. His early research interests were parallel and concurrent programming and open-source software.
Cate Berard is the Team Lead for Sustainability in the Department of Energy (DOE) Office of Sustainable Environmental Stewardship. Cate’s Team supports Departmental implementation of federal sustainability requirements, including: sustainable site operations; energy and water efficiency; sustainable acquisition; environmental management systems; and high performance and sustainable buildings. The Team focuses on providing technical assistance, training, and recognition to DOE sites. Cate directly supports Departmental policies and programs related to electronics stewardship and data center efficiency and optimization. Cate participates in a variety of IT-related standard development activities through IEEE, NSF and UL. She also co-chairs the inter-agency Federal Electronics Stewardship Working Group. Cate holds a B.S. from James Madison University and an M.S. from Johns Hopkins University.
John Elliott is Chief Sustainability Officer at Lawrence Berkeley National Laboratory. With broad and detailed experience across a wide range of sustainability topics, he is responsible for directing and implementing the Lab’s sustainability strategy. He was previously Director, Energy and Sustainability at UC Merced and has done prior work in energy efficiency program design, strategy consulting to utilities, advancing efficiency and renewables with native American tribes, and leading a professional services team implementing energy software solutions. He holds a masters degree in Energy and Resources from UC Berkeley and a bachelors in civil and environmental engineering from Stanford University.
Esa Heiskanen is the data center operations specialist CSC - IT Center for Science. He has started first time on CSC as at 2014 in national supercomputing center located in Kajaani. For past years he has been working on LUMI data center project and data center construction work. Main responsibilities has been DC electrical, cooling and waste re-usage solutions as part of data center facilities team.
Michael Ott received his PhD in computer science from Technische Universität München in 2010 for his work in high performance bioinformatics. Before he joined the Leibniz Supercomputing Center (LRZ) in 2012 he was a postdoc with the Commonwealth Scientific and Industrial Research Organisation in Canberra, Australia and an IT consultant in the automotive and the financial sector in Munich, Germany. He is now a senior researcher in the “High-Performance Systems” division of LRZ. Michael leads the Operational Data Analytics Team in the Energy Efficient HPC Working Group and the Energy Efficiency Working Group in the ETP4HPC. His research focuses on energy-efficiency and scalable monitoring, but he still keeps up an interest in bioinformatics, computer architecture, and parallel programming.
Daniele Cesarini (Ph.D.) is an HPC Specialist at the HPC department of CINECA where his works is focused on the evaluation of next-generation HPC architectures to define the roadmap of CINECA’s computing infrastructures. He is a Steering Board member of ETP4HPC and he participates in several European research projects including the European Processor Initiative (EPI-SGA1) and the Resource Management for the Exascale Era (REGALE) where he leads the development roadmap of the energy-efficient HPC tools.
He graduated in Computer Engineering from the University of Bologna (Italy) in 2014, where he also earned his Ph.D. in Electronics, Telecommunications, and Information Technologies Engineering in 2019. His range of expertise includes the development of SW-HW co-design strategies to support energy-efficient HPC systems. He also leads the energy-efficient HPC activities of CINECA where his work is focused to improve the power efficiency of the CINECA's data centers.
David Smith is currently on the Data Center Facilities and Infrastructure Team at Sandia National Laboratory. He has worked at Sandia National Laboratories for 12 years after having spent 5 years as an intern for the HPC group.
Jim Rogers is the Computing and Facilities Director for the National Center for Computational Science at the Oak Ridge National Lab. Mr. Rogers has thirty years of experience in high-performance computing (HPC) and has provided strategic planning, technology insertion, and integration support for multiple computing centers, including the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (OLCF), the U.S. Air Force, NOAA's National Climate Computing Research Center (NCRC), U.S. Army Corps of Engineers Engineer Research and Development Center (ERDC), the Aeronautical Systems Center, the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, NASA Ames Research Center, the Defense Intelligence Agency, and the Alabama Supercomputer Center.
He has primary responsibility for the strategy, acquisition, delivery, integration, and transition to production for high performance computing, storage, networking, and analysis systems as well as the physical facilities that house these systems. He manages recurring operational activities for both the NCCS systems and the supporting facility/infrastructure . These activities extend across multiple Federal customers.
Thomas Ilsche (Ph.D.) received his doctorate in computer science from TU Dresden in 2020. He is working as a research scientist at the Center for Information Services and High Performance Computing at TU Dresden. His research interests include energy measurement, measurement data processing infrastructures, and performance analysis and optimization for High Performance Computing.
Yusuke Doi is a Corporate Officer and VP of Computing Infrastructure at Preferred Networks (PFN), overseeing the company's computing infrastructure including development of the MN-Core processor.
Woong Shin (Ph.D.) is an HPC systems engineer and a researcher in the Analytics & AI Methods at Scale (AAIMS) Group at Oak Ridge National Laboratory (ORNL). He is involved in R&D and engineering activities around designing and improving system software & system architectures for HPC systems. Currently he is a technical lead in developing and maintaining operational data analytics systems that provide near-real time and long term insights for the Oak Ridge Leadership Computing Facility. With this role, he has been actively participating in the operational data analytics team in the EE HPC WG.
Woong started his career as a software engineer in the enterprise sector, working for Samsung & TmaxSoft (South Korea) in the business of developing monitoring systems and business intelligence systems. Later in career, he pursued academic training in system software, distributed systems, and computer architecture specialized in NVRAM based storage systems. He joined ORNL in 2017. He received his Ph.D. degree in electrical engineering and computer science (M.S. and Ph.D. integrated course) in 2017 from Seoul National University, South Korea. He earned his B.S. in computer science in 2003 from Korea University, South Korea.
Siddhartha Jana (Ph.D.) is a research scientist at Intel Corporation and the conferences co-lead within the EE HPC WG (Energy Efficient HPC Working Group). He holds a doctorate from the University of Houston in energy efficiency and distributed memory programming models. At Intel, his research projects are driven towards leveraging hardware features to explore energy efficiency within the HPC software stack. His other research interests include programming models, High Performance Computing, compiler design and analyses, runtime systems, communication libraries, and distributed computing. As part of his research, he has collaborated with a number of organizations across academia, government, and the industry including Total, Oak Ridge National Laboratory, Technische University, Dresden, Intel, Los Alamos National Laboratory and Cray Inc. With his two hats on - Intel and EE HPC WG, Sid is actively collaborating on HPC PowerStack, a community-wide effort to design a unified HPC system stack that will facilitate building system-wide power efficiency solutions for future large-scale machines.
Torsten Wilde (Ph.D.) is a system architect for Exascale monitoring and system power and energy management at Hewlett Packard Enterprise (HPE). His research activities are related to high volume, high frequency data collection and analytics for improved IT operations as well as dynamic power management. Torsten has published more than two dozen research papers mainly related to power and energy usage and improvement in High Performance Computing. Torsten is the lead architect for HPE's Exascale monitoring framework prototype developed as part of the ECP (Exascale Compute Project) funded PathForward project. Torsten received his MSc in parallel and scientific computation from the University of Liverpool, UK, and his MSc in Computer Engineering from the University of Applied Sciences in Berlin, Germany. He received his Ph.D. in computer science from the Technical University of Munich, Germany, in 2018.
Natalie Bates has led the Energy Efficient High Performance Computing Working Group (EE HPC WG) since its inception in 2010. The purpose of the WG is to drive implementation of energy efficient design in HPC. Today, there are ~800 members from 25+ countries. Natalie has been the technical and executive leader for this ‘open source’ working group that disseminates best practices, shares information (peer to peer exchange), and takes collective action. The EE HPC WG has collaborated and negotiated with industry standards committees and major HPC organizations as well as influenced HPC system development. Prior to leading the EE HPC WG, Natalie's career spanned twenty years with Intel Corporation where she was a senior manager of highly complex programs taking new products to market, delivering multi-component and multi-partner platforms, and negotiating strategic technical industry initiatives.