09:00 - 09:05 Welcome Message
09:05 - 10:00 Keynote Talk: Improving I/O Resource Usage in HPC
— Francieli Boito (Université de Bordeaux, France)
10:00 - 10:30 Full Paper Talk: Checkpointing Optimisation to Prepare Future Exascale Plasma Turbulence Simulations
— Méline Trochon, Julien Bigot, Virginie Grandgirard, Dorian Midou (Inria, France, CEA/MDLS, France, CEA/IRFM, France)
10:30 - 11:00 Coffee Break
11:00 - 11:30 Invited Talk I: The Data Deluge: Overcoming the Barriers to Extreme Scale
— Scott Klasky (Oak Ridge National Laboratory, USA)
11:30 - 12:00 Full Paper Talk: SCORPIO: A Parallel I/O library for Exascale Earth System Models
— Jayesh Krishna, Danqing Wu, Robert Jacob, Dmitry Ganyushin (Argonne National Laboratory, USA, Oak Ridge National Laboratory, USA)
12:00 - 12:30 Full Paper Talk: Streamlining HDF5’s AI Workloads Benchmarking
— Dlyaver Djebarov, Radita Liem, Sarah Neuwirth, Jean Luca Bez, Suren Byna (RWTH Aachen University, Germany, Johannes Gutenberg University
Mainz, Germany, Lawrence Berkeley National Laboratory, USA, The Ohio State University, USA)
12:30 - 14:00 Lunch Break
14:00 - 14:30 Invited Talk II: Data Readiness for AI – A Framework for Evaluation and Improvement
— Suren Byna (The Ohio State University, USA)
14:30 - 15:00 Invited Talk III: Data at Scale: The Case of Scientific AI
— Jean-Thomas Acquaviva (DDN Storage, France)
15:00 - 15:25 Short Paper Talk: IOPS: I/O Performance Evaluation Suite
— Mahamat Abdraman, Francieli Boito, Luan Teylo (Inria, France)
15:25 - 15:50 Short Paper Talk: NAPEH: An Asynchronous and NUMA-Aware KV Store Based on Non-Volatile Memory Architectures
— Yili Ma, Shengquan Yin, Jing Xing, Haoquan Long, Zheng Wei, Guangming Tan, Dingwen Tao (ICT, Chinese Academy of Sciences, China)
15:50 - 16:00 Closing Remarks
16:00 - 16:30 Coffee Break / End of Workshop Day II
In HPC systems, I/O resources are not traditionally arbitrated in the same way as compute resources (nodes, CPUs, etc). However, I/O resources have an important impact on performance, and this impact depends on application characteristics. Moreover, these resources are potentially shared by jobs, which may then suffer with interference from others. In this talk, I will discuss recent work from the TADaaM team from the Inria Center at the University of Bordeaux towards improving HPC I/O resource usage through better allocation and scheduling algorithms, and on strategies for obtaining the information required by these techniques.
Francieli Zanon Boito is an Associate Professor at the Université de Bordeaux, France, where she has been a faculty member since 2019. She teaches in the Informatics department at the Collège Sciences et Technologies and conducts research at LaBRI and Inria Bordeaux within the SATANAS department, TADaaM team. She earned her Ph.D. in Computer Science in 2015 from both Université Grenoble Alpes (France) and Universidade Federal do Rio Grande do Sul (Brazil). Her research focuses on high-performance computing, with an emphasis on parallel I/O, file systems, and data access and storage.
The rapid growth in technology and AI/ML is providing unprecedented opportunities for scientific inquiry. However, dealing with the data produced has resulted in a crisis. Computer speeds are increasing much faster than are storage technology capacities and I/O rates. This ratio is also getting worse for experimental and observational facilities, where for example, the Square Kilometre Array will generate over 2 PB per night in 2028. This reality makes it critical for our community to: 1) Create efficient mechanisms to move, store, and process the data in a Findable, Addressable, Interoperable and Reproducible (FAIR) fashion; 2) Create efficient abstractions so that scientists can perform both online and offline analysis in an efficient fashion; 3) Create new reduction algorithms which can provide quantifiable error bounds on both the primary data and Quantities of Interests.
To tackle these goals, I have worked closely with many large-scale applications and researchers to co-design critical software infrastructure for these communities. These research artifacts have been fully integrated into many of the largest simulations and experiments, and have increased the performance of these codes by over 10X. This impact was recognized with an R&D 100 award in 2013 and was highlighted in the 2020 US Department of Energy (DOE) Advanced Scientific Computing Research (ASCR) @40 report. In this presentation I will discuss the research details on four major contributions I have led: large scale self-describing parallel I/O (ADIOS), in situ/streaming data (SST), data refactoring (MGARD), and most recently on campaign scientific data management (CSDM). I will introduce the overall concepts and present several results from our research, which has been applied and fully integrated into many of the world's largest scientific applications.
Speaker Bio:
Dr. Scott A. Klasky is a distinguished scientist and the group leader for Workflow Systems in the Computer Science and Mathematics Division at the Oak Ridge National Laboratory. He holds an appointment at the University of Tennessee, and Georgia Tech University. He obtained his Ph.D. in Physics from the University of Texas at Austin (1994), and then moved to Syracuse University to become a senior research scientist in computer science in 1998. In 1999, Dr. Klasky later moved to the Princeton Plasma Physics laboratory to lead scientific data management and visualization. Dr. Klasky is a world expert in scientific computing, scientific data reduction, and scientific data management, co-authoring over 350 papers, and has an h-index of 53.
Datasets are the critical fuel for AI. Poorly collected and constructed data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Checking for data readiness and preparing the data are crucial steps in developing trustworthy AI models. In this talk, I will present various dimensions of defining readiness of data for AI usage and present our recent efforts in improving the readiness.
Speaker Bio:
Suren Byna is a Professor in the Department of Computer Science and Engineering (CSE) at The Ohio State University (OSU). He was a Senior Computer Scientist at Lawrence Berkeley National Laboratory (LBNL), where he is now a Visiting Faculty Scientist. He leads the Innovative Data Technologies Lab at OSU. His research interests span across many topics in scientific data management and analysis, including parallel I/O, file and data management systems, I/O libraries, file formats, and metadata management. He also leads projects in the areas of data quality, cybersecurity and trustworthiness of data, and AI readiness of data.
Traditional HPC centers are formidable infrastructures; they used to be a league of their own due to their massive scale and computing power. However, the landscape is shifting with the emergence of AI-specialized data centers, which are increasingly challenging the dominance of traditional HPC centers. These AI-specialized data centers introduce a new paradigm in computing, focusing on the unique demands of artificial intelligence and machine learning workloads.
In this talk, we will discuss the differences and areas of convergence between these two realms from the perspective of the storage community, including data workloads, data management, data mediums, and platforms.
Speaker Bio:
Jean-Thomas successively worked for Intel, the University of Versailles, and the French Atomic Commission (CEA). He participated in the creation of their joint laboratory on Exascale Research. At DDN, Jean-Thomas' role includes overseeing research collaborations in Europe and product management for some advanced DDN’s solutions.