The workshop will be held in Kobe, Japan on Tuesday, September 24, 2024 in Room 403.
Note: All presenters and attendees must register at the IEEE Cluster website.
Agenda: (all times are in Japan Standard Time / JST)
10:45 - 11:00 – Welcome Message & Speed Introduction
11:00 - 12:00 – Keynote: Recent Trends in Ad-hoc HPC File Systems and Caching File Systems [Slides]
Speaker: Osamu Tatebe, University of Tsukuba (Japan)
12:00 - 13:15 – Lunch Break
13:15 - 13:45 – Expert Talk I: Data in HPC: Data Optimization, Compression and Analysis, Kento Sato, RIKEN (Japan). [Slides]
13:45 - 14:45 – Full Research Papers (30 minutes per talk including Q&A)
Paper Talk I: Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring, Jeremy Williams, Daniel Medeiros, Stefan Costea, David Tskhakaya, Franz Poeschel, René Widera, Axel Huebl, Scott Klasky, Norbert Podhorszki, Leon Kos, Ales Podolnik, Jakub Hromadka, Tapish Narwal, Klaus Steiniger, Michael Bussmann, Erwin Laure and Stefano Markidis. KTH Royal Institute of Technology (Sweden), LeCAD, University of Ljubljana (Slovenia), Institute of Plasma Physics of the CAS (Czechia), Helmholtz-Zentrum Dresden-Rossendorf (Germany), Lawrence Berkeley National Laboratory (USA), Oak Ridge National Laboratory (USA), and Max Planck Computing and Data Facility (Germany). [Slides]
Paper Talk II: Understanding Adaptable Storage for Diverse Workloads, Olga Kogiou, Hariharan Devarajan, Chen Wang, Weikuan Yu and Kathryn Mohror. Florida State University (USA) and Lawrence Livermore National Laboratory (USA). [Slides]
14:45 - 15:00 – Coffee Break
15:00 - 15:30 – Expert Talk II: Unveiling I/O Insights of HPC Applications Using the Metric Proxy and FTIO, Ahmad Tarraf, TU Darmstadt (Germany). [Slides]
15:30 - 16:30 – Short Research Papers (20 minutes per talk including Q&A)
Paper Talk III: Object-Centric Data Management in HPC Workflows - A Case Study, Chen Wang, Houjun Tang, Jean Luca Bez and Suren Byna. Lawrence Livermore National Laboratory (USA) and The Ohio State University (USA). [Slides]
Paper Talk IV: Studying the Effects of Asynchronous I/O on HPC I/O Patterns, Arnav Gupta, Druva Dhakshinamoorthy and Arnab K. Paul. BITS Pilani, KK Birla Goa Campus (India). [Slides]
Paper Talk V: Challenges in Understanding Metadata Performance: A Case of Metadata Analysis Using Score-P, Boris Kosmynin and Radita Liem. RWTH Aachen University (Germany). [Slides]
16:30 - 16:45 – Coffee Break
16:45 - 17:15 – Expert Talk III: Learning on the Edge: Unlocking the Storage Bottleneck with a Divide and Conquer Approach, Jalil Boukhobza, ENSTA Bretagne (France). [Slides]
17:15 - 17:45 – Expert Talk IV: Measuring Mayhem, why current IO monitoring is not enough and what to do about it, Jay Lofstead, Sandia National Laboratories (USA). [Slides]
17:45 - 18:14 – Expert Panel: Emerging HPC Workloads and the Future Direction of Parallel I/O Research
Panelists: Osamu Tatebe, Jay Lofstead, Ahmad Tarraf, Jalil Boukhobza
Moderator: Arnab K. Paul
18:14 - 18:15 – Closing Remarks
18:15 – End of REX-IO Workshop Day
Osamu Tatebe (University of Tsukuba)
Abstract: An ad-hoc HPC file system is a parallel and distributed file system that employs node-local storage of compute nodes. The performance of the storage system can be enhanced by increasing the number of compute nodes. As ad-hoc file systems are configured subsequent to job allocation, data must be transferred from and to the backend persistent parallel file system. The transfer of data manually is susceptible to errors. Caching file systems may offer a potential solution to this issue; however, several challenges remain unresolved in this area. One of the most significant challenges is the suboptimal performance of metadata. As existing caching file systems rely on the metadata server of the backend parallel file system, the performance of the metadata in caching file systems is inferior to that of the backend parallel file system. CHFS/Cache addresses this issue by establishing a reasonable relaxation of the semantics between the backend parallel file system and the caching file system. Moreover, CHFS/Cache addresses the suboptimal access performance observed during the flushing of data to the backend file system. The use of node-local storage in caching file systems has the potential to narrow the discrepancy between compute performance and storage performance.
About the speaker: Osamu Tatebe received his Ph.D. in Computer Science from the University of Tokyo in 1997. He worked at the Electrotechnical Laboratory (ETL) and the National Institute of Advanced Industrial Science and Technology (AIST) until 2006. He is currently a professor at the Center for Computational Sciences at the University of Tsukuba. Since 2000, he has led the research and development of the Gfarm file system, which is currently used in a nationwide 100PB HPCI shared storage infrastructure in Japan. He is presently engaged in the exploration of the next generation of high performance computing (HPC) storage architecture. He has received awards in the SC2003 High Performance Bandwidth Challenge, the SC2005 StorCloud Challenge, and the SC2006 Storage Challenge. His research interests include HPC storage architecture and HPC system software.