Workshop Program

AGENDA

First day 25.06. (7:30AM – 10:00AM Pacific Time)

Introduction, Sid Jana (Recording)

Conversation on HPC, AI and sustainability: Dan Reed, Dave Patterson and Nic Dubé (Recording)

Water management for system and facility cooling: Otto VanGeet, Chris DePrater, Michael Kercher and Torsten Wilde (Recording)

NCAR No-Blowdown Condensing Water System, Michael Kercher (Presentation)
PGW25 Cooling Fluid Figure of Merit, Torsten Wilde (Presentation

AI and HPC operations and facility infrastructure: Al Geist, Jason Hick, Ryousei Takano and Sadaf Alam (Recording)

AI and HPC operations and facility infrastructure, Sadaf Alam (Presentation)
Sustainable AI Computing Infrastructure at Scale: Insights from ABCI, Ryousei Takano (Presentation)
AI & HPC Facility Trends, Jason Hick (Presentation)

Second day 26.06. (8:00AM - 10:00AM Pacific Time)

Software driven power and energy management: Siddhartha Jana, Gülçin Gedik, Michael Ott, Ralf Schneider, Solomon Bekele and Kazunori Mikami (Recording)

Managing power consumption and performance at the user level: Some study for executing high performance and energy efficient jobs on supercomputer Fugaku, Kazunori Mikami

Alternative power sources for data centers: Shaohui Liu, Sean Jones and Matthew Anderson (Recording)

Battery Storage Applications at Data Centers, Sean Jones (Presentation)

Closure and actions (Recording)

Encouraging Energy Efficiency and Sustainability in HPC, Natalie Bates

Dan Reed is Presidential Professor Emeritus and formerly served as Provost at the University of Utah where he was Presidential Professor of Computational Science and Professor of Computer Science and Electrical & Computer Engineering. Previously, Reed helped shape Microsoft's long-term vision for technology innovations in cloud computing and the company's policy engagement with governments and institutions worldwide as the company’s Corporate Vice President for Technology Policy and Extreme Computing. Reed has served on the U.S. President’s Council of Advisors on Science and Technology, the President’s Information Technology Advisory Committee, the National Academies of Science Board on Global Science and Technology, the International Telecommunications Union CTO Council, and the ICANN Generic Names Supporting Organization Council. He has chaired the Department of Energy’s Advanced Scientific Computing Advisory Committee and the National Science Board, which oversees NSF. Reed is a Fellow of ACM, IEEE, and AAAS.

David Patterson is a Google distinguished engineer since 2016, a UC Berkeley Pardee professor emeritus, the RIOS Laboratory Director, and the RISC-V International Vice-Chair. His most influential Berkeley projects likely were RISC and RAID. He received service awards for his roles as ACM President, Berkeley CS Division Chair, and CRA Chair and awards for his teaching. The most prominent of his seven co-authored books is Computer Architecture: A Quantitative Approach. He and his co-author John Hennessy shared the 2017 ACM A.M Turing Award, the 2021 BBVA Foundation Frontiers of Knowledge Award, and the 2022 NAE Charles Stark Draper Prize for Engineering. The Turing Award is often referred to as the “Nobel Prize of Computing” and the Draper Prize is considered a “Nobel Prize of Engineering.” David received BA, MS, and PhD degrees from UCLA.

Nic Dubé is Senior Vice-President, System Engineering, Arm. Nic leads a newly formed engineering group driving future system architecture for datacenter deployments at scale. The rapidly growing team covers system hardware, system software, system interconnect, system management, system storage, datacenter design, and performance engineering. He previously served as Senior Vice-President for HPC & AI Cloud Services at Hewlett Packard Enterprise (HPE). In this role, he led the organization responsible for delivering strong scaling applications and complex workflows, on demand. As Senior Fellow and Chief Technologist for HPC at HPE, Nic notably led the team that delivered ORNL's Frontier, which was the first system in history to break the exascale barrier. He also previously served as the technical lead for the Advanced Development Group, architected HPE’s exascale PathForward program and spearheaded Arm enablement in HPC. He received his Ph.D. degree from Laval University, Canada.

Otto Van Geet is a Principal Engineer at NREL. Van Geet has been involved in the design, construction, and operation of energy-efficient research facilities such as laboratories and data centers, office and general use facilities, and low-energy-use campus and community design. Van Geet was one of the founding members of the Labs21 (Smart Labs) program and provides technical guidance for the program. His experience also includes renewables screening and assessment, PV system design for on- and off-grid applications, energy audits, and minimizing energy use. B.S., Mechanical Engineering, University of New Mexico, A.A.S., State University of New York at Canton, Registered Professional Engineer (PE), Certified Data Center Energy Practitioner (DCEP), Certified Energy Manager (CEM), LEED Accredited Professional (LEED AP), Project Management Professional (PMP)

Chris DePrater is a system engineer at Lawrence Livermore National Laboratory (LLNL) with over a decade of experience in High Performance Computing (HPC). Specializing in building controls, mechanical systems and is an active member Energy Efficient HPC Working Group (EEHPCWG). Chris has a certification in Industrial Maintenance Air Conditioning and a bachelor’s degree From DeVry University in Electronic Engineering. He has been involved in the planning and siting of multiple top 500 HPC systems, Sequoia, Sierra and currently working towards Exascale at LLNL. Chris values teamwork and collaboration with peers in and out of the workplace. Outside of work Chris is a devoted father of 7 and enjoys the outdoors.

Michael Kercher is Operations Section Facility Manager for the National Center for Atmospheric Research's Wyoming Supercomputing Center. He has a solid IT background and 18+ years of experience in a wide range of electrical disciplines. He has been working at NCAR since 2011, starting as a Lead Electrician, then Plant Manager, and moving to Assistant Operations Manager before assuming his current responsibilities. Michael has a Master Electrician License in the State of Wyoming, and also an MBA with a focus on Data Analytics.

Dr. Torsten Wilde is a Distinguished Technologist and HPC system software architect at Hewlett Packard Enterprise (HPE). He is working on monitoring and management for Exascale systems and is leading HPE’s efforts on dynamic power and energy management for HPC systems. Before joining HPE in 2018, Torsten was working as a senior Research Scientist at the Leibniz Supercomputing Centre (LRZ, Munich, Germany) and as a business workflow system analyst at Internap Holding (INAP, an internet service provider, Norcross, GA, USA). His career began as a Computer Scientist as part of the Computer Science Research Group at Oak Ridge National Laboratory (ORNL, Oak Ridge, TN, USA) in 2000. Torsten received his MSc in parallel and scientific computation from the University of Liverpool, UK, and a MSc in Computer Engineering from the University of Applied Sciences in Berlin, Germany. He received his Dr. in computer science from the Technical University of Munich, Germany, in 2018.

Al Geist is a Corporate Research Fellow at the Department of Energy (DOE) Oak Ridge National Laboratory. Al is in the Leadership Computing Facility and is the Frontier Project Director, the world’s first Exascale computer. He is presently leading the “Computer Design and Build” activities for the system coming after Frontier, called Discovery. Al was the Chief Technology Officer and a part of the Senior Leadership Team for the DOE Exascale Computing Project. He was also the Chief Technology Officer of the Oak Ridge Leadership Computing Facility for 20 years until 2023 and was the author of the Project Execution Plan and Acquisition Plan documents for Jaguar, Titan, Summit, and Frontier system acquisitions. In his 40 years at ORNL, he has published two books and over 200 papers in areas ranging from heterogeneous distributed computing, numerical linear algebra, parallel computing, collaboration technologies, solar energy, materials science, biology, and solid-state physics. For 20 years Al led the 25 member Computer Science Research Group. He is one of the original developers of PVM (Parallel Virtual Machine), which became a world-wide de facto standard for heterogeneous distributed computing and was actively involved in the design of the Message Passing Interface (MPI) standard.

Ryousei Takano is a principal research manager of the Institute of Advanced Industrial Science and Technology (AIST), Japan. Ryousei was instrumental in deploying and managing the AI Bridging Cloud Infrastructure supercomputer which merged accelerated computing platforms with cloud infrastructure software to forge a shared utility for researchers in Japan to do AI experiments. He received his Ph.D. from the Tokyo University of Agriculture and Technology in 2008. He joined AXE, Inc. in 2003 and then, in 2008, moved to AIST. His research interests include operating systems and distributed parallel computing.

Dr. Sadaf Alam is chief technology officer (CTO) for Bristol Centre for Supercomputing (BriCS), home to Isambard 3 and Isambard-AI, part of the national AI Research Resource (AIRR). She is also director of strategy and academia in the Advanced Computing Research Centre (ACRC). Across both roles, she is responsible for digital transformation of research computing and data services. Prior to joining Bristol, Alam was the CTO at CSCS, the Swiss National Supercomputing Centre. She was chief architect for two generations of the Piz Daint innovative flagship supercomputing facilities and the MeteoSwiss operational weather forecasting platforms. From 2004 to 2009, Alam was a computer scientist at Oak Ridge National Laboratory (ORNL) and a staff scientist at the ORNL Leadership Computing Facility (OLCF). She studied computer science at the University of Edinburgh, UK, where she received her PhD.

Jason Hick currently serves as project director for Los Alamos National Lab’s Future Supercomputing Infrastructure effort, focusing on creating LANL’s next generation facilities for HPC and AI. He has worked in HPC for the U.S. Department of Energy in various technical, line, and program manager positions for the past 25 years.

Kazunori Mikami is a Senior Technical Staff, Operations and Computer Technologies Division in R-CCS, RIKEN. He played with various types of supercomputers mostly from the application side. Since 2022, he has driven the user support ticket system and integrated it with generative AI in R-CCS. From 2013-2021, he was involved with the Flagship 2020 project - a codesign design effort for the Fugaku system. From 1985-2013, he was affiliated with Cray Research in Japan and was the Vice President and director of Applications, where his focus was on the application performance improvement for vector supercomputers and MPPs.

Michael Ott is a senior research engineer in the Future Computing group at Leibniz Supercomputing Centre (LRZ). He leads LRZ's research activities on energy efficiency and is the technical lead of the Operational Data Analytics team in the Energy Efficient HPC Working Group (EE HPC WG). Before Michael joined LRZ in 2012, he was a postdoc with the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Canberra, Australia; Michael received his PhD in computer science from the Technical University of Munich (TUM) in 2010 for his work in high-performance bioinformatics. His research focuses on energy-efficient HPC operations, scalable monitoring, and operational data analytics, but he still keeps up an interest in bioinformatics, computer architecture, and parallel programming.

Dr.-Ing. Ralf Schneider joined the High Performance Computing Center Stuttgart (HLRS) after having received his Diploma in Aerospace Engineering in 2007. He received his PhD in 2016 for his work from the University of Stuttgart. Since 2019 he is responsible for requirement definition and planning of the new HLRS data-centre that is currently under construction and also since 2019, part of the HPC-system procurement team

Solomon Bekele is a Postdoctoral Appointee at the Argonne Leadership Computing Facility, where he specializes in performance–energy trade-offs in heterogeneous high-performance computing (HPC) systems. His research focuses on developing tracing frameworks for heterogeneous architectures and designing strategies to improve energy efficiency at exascale. He holds a Ph.D. in Computer Engineering from IIT Delhi, where he investigated resource contention-aware performance–energy trade-offs in chip multiprocessors.

Gabriel Hautreux is in charge of the HPC and AI department at CINES, one of the three French national centers, and managed the deployement of the leading edge cluster Adastra, #11 at TOP500 and #3 at GREEN500 in November 22. He previously worked as an HPC application expert, in charge of the French High Level Support Team (HLST) and enabled dozens of scientific teams to prepare their applications for upcoming systems in GENCI’s “Technological Watch Group” between 2016 and 2019. His current activities aim to enable the research community to leverage application using exascale technologies, as well as increasing the energy efficiency of applications and reducing the global energy and carbon footprint of HPC center. He is a specialist in exascale architectures, new development paradigms and energy efficiency in HPC and AI systems.

Siddhartha Jana (Sid) is a research scientist at Intel Corporation and Conferences co-lead within the EE HPCWG. He holds a doctorate from the University of Houston in energy efficiency and distributed memory programming models. At Intel, his research projects are driven towards leveraging hardware features to explore energy efficiency within the HPC software stack. His other research interests include programming models, High Performance Computing, compiler design and analyses, runtime systems, communication libraries, and distributed computing. As part of his research, he has collaborated with a number of organizations across academia, government, and the industry. Sid is actively driving the HPC PowerStack initiative, a community-wide effort with a charter to design and standardize solutions for system-wide power efficiency targeting large-scale machines.

Sean Jones is Staff Business Development Manager of Megapack, Tesla. He leads Tesla’s Megapack at data center business development efforts where he focuses on building partnerships with data center operators, developers, regulators, and utilities. His work spans both Tesla’s current product offerings and the development of new solutions tailored to the evolving needs of the industry.

Matthew Anderson is the Manager of the High Performance Computing Group at the Idaho National Laboratory, where his research includes Numerical Relativity, Relativistic Magneto Hydro Dynamics, and High Performance Computing. He has worked in computational science and high performance computing for over 10 years and has written over 30 publications. He received his Ph.D. in Physics from the University of Texas at Austin

Shaohui Liu is a Postdoc Associate at the MIT Energy Initiative and ChemE. His research at MIT includes modeling and optimization of demand response in decarbonized utility-scale energy systems. He received his Ph.D. in ECE from the University of Texas at Austin. Shaohui was a summer intern with Los Alamos National Lab, Argonne National Lab, and Amazon AWS. He is an IEEE member, serves as the secretary of IEEE Working Group on Cloud for Grid Modernization and Digital Transformation, and is a member of the Energy Efficient High Performance Computing (EEHPC) working group.

Page updated

Report abuse