Invited Speakers:
Profiling and Modeling for Application and System Analysis
Heidi Poxon, AWS, US
HPC applications are diverse in the work they need to perform as they are used to solve a variety of problems. At a minimum however they do some form of computation and data movement that relies on algorithms within the code and underlying system architecture. As we create more complex workflows such as integrating machine learning with modeling and simulation, understanding how an application behaves for a targeted system remains key. Knowing application behavior shows us where to apply adjustments to reduce time-to-solution, how to maximize performance while maintaining portability, and what future technology may be beneficial for a code. Profiling and modeling are two mechanisms to assess application behavior. Profiling is used to uncover bottlenecks such as load imbalance or top timing-consuming sections within a code that inhibit performance or scaling. Modeling is used to mimic performance of application compute and data movement patterns or underlying system architecture and predict how a code will react with different technology. Both help answer why an application is performing in a particular way. This presentation discusses profiling and modeling approaches, what has worked well and what hasn’t, and how these approaches transfer when analyzing application performance in different environments such as local data centers versus cloud.
Report on Dagstuhl Seminar "Driving HPC Operations With Holistic Monitoring and Operational Data Analytics"
Florina Ciorba, Univ of Basel, CH
In this talk we will present a community vision for revolutionizing HPC operations and research through autonomous monitor-analyze-feedback-response loops to enable true CoDesign. This vision is based on the concept of autonomous control loops that can be implemented in HPC systems, inspired by their use in autonomous computing and self-adaptive systems. The aim of this community is to develop proof-of-concept solutions for representative use cases that can implement autonomous loops. The use cases are aimed to provide practical insights into the feasibility and effectiveness of the methods.
Workshop Schedule
All times in US Mountain Daylight Time.
9:15am - 9:30am: Welcome and Introduction (Ann Gentile)
9:30am - 10:20am: Keynote: Profiling and Modeling for Application and System Analysis (Heidi Poxon)
10:25am - 10:40am: Lightning Talk: VAST DB integration with Large Scale Monitoring Data - Finding a Needle in a pile of Needles (Kyle Lamb)
10:45am - 11:15am: Break
11:15am - 11:45pm: Paper Talk: Incorporating Staggered Planned Maintenance Reservations to Improve Performance in Computational Clusters (William Jones, Craig Walker, Vivian Hafener, Warren Graham, Nathan DeBardeleben and Steven Senator)
11:45pm - 12:00pm: Lightning Talk: A Study of HPC Job Wait Times through Reason-Based Analysis (Thomas Jakobsche)
12:00pm - 12:15pm: Lightning Talk: A Method for Interpreting Workload Power Efficiency (David DeBonis)
12:15pm - 12:30pm: Lightning Talk: A Fully Managed, Containerized Data Analytics Cluster (Cory Lueninghoener)
12:45pm - 2:00pm: Lunch
2:00pm - 2:45pm: Keynote: Report on Dagstuhl Seminar "Driving HPC Operations With Holistic Monitoring and Operational Data Analytics" (Florina Ciorba)
2:45pm - 3:00pm: Lightning Talk: A User Perspective of Always-On HPC Monitoring (Scot Swan)
3:00pm - 3:30pm: Paper Talk: Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations (Francieli Boito, Jim Brandt, Valeria Cardellini, Philip Carns, Florina M. Ciorba, Hilary Egan, Ahmed Eleliemy, Ann Gentile, Thomas Gruber, Jeff Hanson, Utz-Uwe Haus, Kevin Huck, Thomas Ilsche, Thomas Jakobsche, Terry Jones, Sven Karlsson, Abdullah Mueen, Michael Ott, Tapasya Patki, Krishnan Raghavan, Stephen Simms, Kathleen Shoga, Michael Showerman, Devesh Tiwari, Torsten Wilde, Ivy Peng and Keiji Yamamoto)
3:30pm - 4:00pm: Break
4:00pm - 5:30pm: Topical Discussion: “System and Application-driven Feedback Loops to Drive HPC Efficiency” (Jim Brandt)