Links to presentations and recordings of the sessions are listed in the AGENDA below.
AGENDA (All times are Pacific Time)
First day 25.05. (7:00AM – 9:35AM Pacific Time).
Introduction (7:00-7:05) Siddhartha Jana, Intel
Session 1 – Motivating end users for energy efficiency (7:05-8:00) Matthias Maiterth, ORNL
Session 2 – Green500 measurements (8:00-8:45) Thomas Ilsche, TU Dresden
Presentations for Session 2- Thomas Ilsche and Toshio Endo
Session 3 – Sustainability metrics for impact (8:45-9:30) Rob Bunger, Schneider Electric
Closure (9:30-9:35) Natalie Bates, EE HPC WG
Second day 26.06. (7:00AM - 9:45AM Pacific Time)
Introduction (7:00-7:05) Siddhartha Jana, Intel
Invited Talk 1 (7:05–7:50) Andrew Chien, University of Chicago
Presentation for Andrew Chien Invited Talk
Recording for Andrew Chien Invited Talk
Break: 7:50-8:00
Session 4 – Advanced facility cooling and controls (8:00-8:45) Vali Sorell, Oracle
Presentation for Session 4
Invited Talk 2 (8:45-9:30) - Johannes Kirnberger, OECD
Presentation for Johannes Kirnberger Invited Talk
Closure and actions (9:30-9:45) Natalie Bates, EE HPC WG
====================================================
Motivating end users for energy efficiency
Participants: Matthais Maiterth, ORNL, Fumiyoshi Shoji from Riken, Luca Bortot from Eni, Osman Seckin Simsek from University of Basel, Daniel Arndt from ORNL
Format: Panel with discussion on what is needed to engage end-users and optimize for energy efficiency.
Questions:
What do you need from the Systems, Scheduler/System Software, Programming Environments / Runtime Systems and what do you need from the applications developers?
How were you successful in engaging users for more sustainable operation and how has this translated over into today's day to day operation?
What user engagement was needed to engage your users from the scheduler side to improve cluster utilization, what lessons learned, what was the impact?
What strategies are viable to shift from application performance optimization to energy optimization?
How can one move from case-study to enabling all users with Kokko's tools interface?
Green500 measurements
Participants: Thomas Ilsche from TU Dresden, Toshio Endo from Tokyo Institute of Technology
Format: Presentations with Q&A
Presentations:
Introduction and overview of the methodology, current challenges and future directions.
Experiences with making a power measurement and submission for TSUBAME4.0, Level 3
Sustainability metrics for impact
Participants: Rob Bunger from Schneider Electric, Jason Hick from LANL, Torsten Wilde from HPE, and Eric Yang from Vantage
Format: Panel discussion on design, procurement and operational data center and HPC system metrics for sustainability.
Questions:
During design, are there specific efficiency or sustainability metrics you target both for the compute system as well as supporting facility? Looking to the future, are there metrics you hope to use?
During procurement, what requirements do you put on vendors in regards to sustainability?
Is embodied carbon something you measure or hope to measure?
Are there any IT performance / efficiency metrics you use or want to use?
What do you current measure during operations? How important is this to your organization?
Is there anything the HPC community can learn from larger data center operators in regards to measuring and reporting on sustainability?
For HPC, how concerned are the internal customers or users who requested jobs on sustainability? Do they require any information on the impact of their jobs? Should they?
How to Lead in HPC in the next decade (Cost and Sustainability) – Hint: It's not about Energy Efficiency
INVITED SPEAKER Andrew Chien from University of Chicago (Moderator Michael Ott from LRZ)
ABSTRACT: With the end of Dennard scaling and slowing Moore's law, next generation leading edge computing systems will exceed 100MW. To make them viable from a cost, power access, and sustainability point of view, new approaches are needed. We will highlight a range of environmental damage issues (climate, water use), and new strategies to make leadership both cost-effective and sustainable. We will highlight approaches in the context of a collaboration with the Fugaku-next project, though they apply to nearly all HPC settings.
Advanced facility cooling and controls
Participants: Vali Sorell from Oracle
Format: Moderator led problem statement and audience participation
Questions:
When designing and sizing your buffer:
Is it pipes alone, or do you have a tank?
Where is the tank located?
When designing and sizing the TCS loop:
What is the practical limit of the TCS loop?
How do you get an even temperature distribution across all the racks?
"AI & Sustainability Policy"
INVITED SPEAKER Johannes Kirnberger from OECD (Moderator Siddhartha Jana from Intel)
ABSTRACT: Governments and policy makers are increasingly aware of issues related to AI and its intersection with climate change and environmental sustainability. For instance, the EU AI Act calls for the creation of Key Performance Indicators to track the energy consumption of AI systems and promote the use of more efficient AI technologies, as well as measure the impact of AI systems on the Sustainable Development Goals (SDGs). The presentation will give an overview of existing and emerging AI policy initiatives and standardisation efforts that address the environmental impact of AI systems.
Siddhartha Jana (Sid) is a research scientist at Intel Corporation and Conferences co-lead within the EE HPCWG. He holds a doctorate from the University of Houston in energy efficiency and distributed memory programming models. At Intel, his research projects are driven towards leveraging hardware features to explore energy efficiency within the HPC software stack. His other research interests include programming models, High Performance Computing, compiler design and analyses, runtime systems, communication libraries, and distributed computing. As part of his research, he has collaborated with a number of organizations across academia, government, and the industry. Sid is actively driving the HPC PowerStack initiative, a community-wide effort with a charter to design and standardize solutions for system-wide power efficiency targeting large-scale machines.
Matthias Maiterth is a postdoctoral research associate at Oak Ridge National Laboratory (ORNL). His research focuses on High Performance Computing and Artificial Intelligence to improve performance, energy and power efficiency of large scale HPC systems, as part of the Analytics & AI Methods at Scale Group of Feiyi Wang. Before joining ORNL, he was academic research staff at Technische Universität München (TUM). He acquired his doctoral degree in computer science from Ludwig-Maximilians-Universität München (LMU) . His doctoral studies included exchanges as visiting researcher to Lawrence Livermore National Laboratory (LLNL), as well as a three-year funded research position at Intel.
Fumiyoshi Shoji received his Ph.D. from Kanazawa University, Japan, in 2000. In 2006 he joined the RIKEN K computer development project as R&D scientist and later was also involved in the Fugaku development project. In his role as team lead, he was awarded the ACM Gordon Bell Price in 2011. Since 2012 he is Division Director of the operations and computer technologies division, RIKEN Center for Computational Science (R-CCS). His division’s responsibilities are the operation and enhancement of Fugaku and its facilities, including substations, chillers, gas turbine power generators, air handlers, among others.
Luca Bortot is a computer scientist, graduate of the University of Milan, working at ENI. His past experience as software engineer includes image processing, information management, large scale network and host monitoring. In 2009 he was part of the team that designed and built ENI’s Green Data Center and in 2013 joined ENI as Data Center Architect. Since 2017 Luca Bortot is IT Architecture Project Leader for ENI’s HPC department. His job role is now titled HPC Knowledge Owner.
Osman Seckin Simsek is a post-doctoral researcher in the High Performance Computing (HPC) group at the University of Basel. He is part of both PASC SPH-EXA2 and SKACH projects, responsible for load-balancing and energy efficiency measures for the SPH-EXA simulation framework. Before, he worked as a post-doctoral researcher at the University of Manchester in the EuroExa project, porting the LFRic weather forecast application onto FPGAs. Dr. Simsek received his Ph.D. in 2019 from the University of Manchester for his thesis titled “Leveraging Data-Flow Task Parallelism for Locality-Aware Dynamic Scheduling on Heterogeneous Platforms”. His research interests include high-performance computing, scheduling, energy efficiency, and heterogeneous programming.
Daniel Arndt is a computational scientist at the Oak Ridge National Laboratory in the Scalable Algorithms and Coupled Physics group. Formerly, he was a PostDoc at the University of Heidelberg working with Prof. Dr. Guido Kanschat in the Mathematical Methods of Simulation group. Before that, he was a member of the workgroup Numerical Methods for Partial Differential Equations supervised by Prof. Dr. Gert Lube in the Institute for Numerical and Applied Mathematics at the University of Göttingen. His current work focusses on backend implementations of Kokkos, in particular for SYCL, and his focusing on large scale computational science.
Thomas Ilsche received his doctorate in computer science from TU Dresden in 2020. He is working as a research scientist at the Center for Information Services and High Performance Computing at TU Dresden. His research interests include energy measurement, measurement data processing infrastructures, and performance analysis and optimization for High Performance Computing.
Toshio Endo is a professor at the Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology. His research interests include high performance and low power computing, including large scale machine learning and scientific computations. He acted as the technical leader of TSUBAME4.0 supercomputer installed in 2024 at Tokyo Institute of Technology. He has a Ph.D. in science from the University of Tokyo (2001). He won the ACM Gordon Bell Prizes for 2011, and the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2012.
Robert Bunger is a Program Director within the CTO Office at Schneider Electric currently working on their liquid cooling initiative, and is based in the US. In his 21 years at Schneider Electric, Robert has held management positions in data center standards, customer service, technical sales, offer management, and business development. He has also lived and worked in Denmark and China, where he supported data center solutions business growth in those regions. Prior to joining Schneider Electric, Robert was a commissioned officer in the U.S. Navy and served eight years in the submarine force. Robert holds a Bachelor of Science degree in Computer Science from the US Naval Academy and a Master’s of Science in Electrical Engineering from Rensselaer Polytechnic Institute (RPI).
Jason Hick is the Facilities, Operations and User Support (FOUS) Program Manager for the Advanced Simulation & Computing (ASC) Program at Los Alamos National Lab (LANL). He has responsibility for facilitating production computing and infrastructure necessary to support the HPC facility, around-the-clock supercomputing operations, and ASC Program users. Previously, he was Storage Systems Group Lead at the National Energy Research Scientific Computing (NERSC) Center at Lawrence Berkeley National Lab (LBNL), Leader of the Data Storage Team at LANL, and an Officer in the Field Artillery of the U.S. Army.
Eric Yang is currently a Principal Mechanical Engineer at Vantage Data Center. He was a Senior Field Mechanical Engineer at Amazon Web Service, improving existing data center energy performance and operation. He is an active practitioner in energy efficiency both in design and operation. He is actively involved in TC 9.9 and TC7.6 (Building Energy Performance). He has led the Energy Management sub-committee of TC7.6 since 2021. Before joining the Data Center industry, he has spent over 13 years of professional experience in developing $500 million projects via Energy Savings Performance Contracting (ESPC), focusing on existing building operations. He also worked for a large architectural engineering company, SmithGroup in Washington DC for three years.
Torsten Wilde is a system architect for HPC system monitoring and system power and energy management at Hewlett Packard Enterprise (HPE). His research activities are related to high volume, high frequency data collection and analytics for improved IT operations as well as dynamic power management. Torsten has published more than two dozen research papers mainly related to power and energy usage and improvement in High Performance Computing. Torsten received his MSc in parallel and scientific computation from the University of Liverpool, UK, and his MSc in Computer Engineering from the University of Applied Sciences in Berlin, Germany. He received his Ph.D. in computer science from the Technical University of Munich, Germany, in 2018.
Andrew A. Chien is the William Eckhardt Distinguished Service Professor in Computer Science, Director of the CERES Center for Unstoppable Computing, as well as Senior Computer Scientist at the Argonne National Laboratory. He currently serves on the Advisory Board for the National Science Foundation’s Computing and Information Science and Engineering Directorate, as a member of the Defense Advanced Research Projects Administration Information Science and Technology Study Group. From 2017-2022, Andrew Chien served as Editor-in-Chief of the ACM’s Flagship publication, the Communications of the ACM, dramatically expanding its international presence and impact. In 2015, Dr. Chien founded the CERES Center, a multi-disciplinary research center involving 15 faculty that seeks to create new foundations for computing systems. From 2011-2016, he led the initiative to build a Systems group in Computer Science that hired ten faculty and transformed the culture, perception, and research breadth of the department.
Vali Sorell is a Senior Data Center Design Engineer for Oracle. Before joining Oracle, Vali worked for Microsoft as a Principal Hardware Engineer for Datacenter Integration Services. Vali has years of design experience dedicated to mission critical projects. At Glumac, Vali was responsible for leading the mechanical design team in developing innovative, energy efficient, reliable, next generation data center concepts. Through industry groups, such as ASHRAE TC 9.9, ASHRAE SSPC 90.4, and 7x24 Exchange, and through various industry publications, Vali continues to push for the continuous development of the industry best practices. Vali has a BS in Mechanical Engineering from Columbia University and another BS in Genetics from McGill University.
Johannes Leon Kirnberger is a policy advisor for AI and sustainability in the AI Unit of the OECD Division of Science, Technology and Innovation (STI). He previously led the program on climate action and biodiversity preservation at the Global Partnership on AI (GPAI) and the International Centre of Expertise in Montreal on AI (CEIMIA). Johannes is a member of the UNEP Expert Group on Digital Tech for Circular Economy, where he co-develops a digital transformation roadmap for catalysing digital technologies to accelerate a circular economy. As a guest lecturer at the Technical University of Munich (TUM), he teaches students on climate change and AI policy. He holds a Bachelor of Science in Management from ESCP Business School, a Master of International Public Management from Sciences Po, and a Master in International Affairs, Energy and Environment from Columbia University.