ESSA 2024: 5th Workshop on Extreme-Scale Storage and Analysis
To be held on May 27, 2024 in conjunction with IEEE IPDPS 2024, San Francisco, CA, USA
Agenda
Workshop Day: Monday, May 27, 2024
Workshop Location: Hyatt Regency San Francisco, Embarcadero Center, San Francisco, California USA
Workshop Room: Regency B (Street Level)
Workshop Location: Hyatt Regency San Francisco, Embarcadero Center, San Francisco, California USA
Workshop Room: Regency B (Street Level)
13:00 - 13:10 Welcome Message
13:10 - 14:00 Keynote: HPC and Databases Revisited
— Jay Lofstead (Sandia National Laboratories)
14:00 - 14:30 Paper Talk: The impact of asynchronous I/O in checkpoint-restart workloads
— Hariharan Devarajan, A. Moody, D. Dai, C. Stanavige, E. Gonsiorowski, M. McFadden, O. Faaland, G. Kosinovsky, K. Mohror
14:30 - 15:00 Paper Talk: Benchmarking variables for checkpointing in HPC Applications
— Xiang Fu, Xin Huang, Wubiao Xu, Shiman Meng, Weiping Zhang, Luanzheng Guo, Kento Sato
15:00 - 15:30 Coffee Break
15:30 - 16:00 Paper Talk: Extending the Mochi Methodology to Enable Dynamic HPC Data Services
— Matthieu Dorier, Philip Carns, Robert Ross, Shane Snyder, Rob Latham, Amal Gueroudji, George Amvrosiadis, Chuck Cranor, Jerome Soumagne
16:00 - 16:30 Paper Talk: Adaptive Per-File Lossless Compression of Floating-Point Data
— Andrew Rodriguez, Noushin Azami, Martin Burtscher
16:30 - 17:00 Paper Talk: Optimizing Forward Wavefield Storage Leveraging High-Speed Storage Media
— João Speglich, Navjot Kukreja, George Bisbas, Átila Saraiva, Jan Hückelheim, Fabio Luporini, John Washbourne
17:00 - 17:30 Paper Talk: The Art of Sparsity: Mastering High-Dimensional Tensor Storage
— Bin Dong, Kesheng Wu, Suren Byna
17:30 - 17:45 Discussion and Closing Remarks
Keynote
Jay Lofstead (Sandia National Laboratories)
Jay Lofstead is a Principal Member of Technical Staff at Sandia National Laboratories. His research interests focus around large scale data management and trusting scientific computing. In particular, he works on storage, IO, metadata, workflows, reproducibility, software engineering, machine learning, and operating system-level support for any of these topics. Broadly across these topics, he is also deeply interested in ethics related to these topics and computing in general and how to drive inclusivity across the computation-related science domains. Dr. Lofstead received his Ph.D. in Computer Science from the Georgia Institute of Technology in 2010.
Keynote Abstract
Around twenty-five years ago, the HPC storage and IO community investigated the potential for relational databases for HPC data management and found numerous issues making an RDBMS a poor choice. SciDB made decisions as well that further cemented the difficulty in terms of ingestion velocity and overhead against an RDBMS mode for HPC data management. More recent work in the metadata arena, such as EMPRESS, and in a new IO library, Stitch-IO, show a potential path towards bringing these communities together. However, the challenges present offer potentially new research and a need to overcome well entrenched bias. This talk will explore the roots of this bias, why we should rethink, and propose a path towards data management bliss.
Workshop Overview
Advances in storage are becoming increasingly critical because workloads on high performance computing (HPC) and cloud systems are producing and consuming more data than ever before, and the situation promises to only increase in future years. Additionally, the last decades have seen relatively few changes in the structure of parallel file systems, and limited interaction between the evolution of parallel file systems, e.g., Lustre, GPFS, and I/O support systems that take advantage of hierarchical storage layers, e.g., node local burst buffers. However, recently the community has seen a large uptick in innovations in storage systems and I/O support software for several reasons:
Technology: The availability of an increasing number of persistent solid-state storage technologies that can replace either memory or disk are creating new opportunities for the structure of storage systems.
Performance requirements: Disk-based parallel file systems cannot satisfy the performance needs of high-end systems. However, it is not clear how solid-state storage can best be used to achieve the needed performance, so new approaches for using solid-state storage in HPC systems are being designed and evaluated.
Application evolution: Data analysis applications, including graph analytics and machine learning, are becoming increasingly important both for scientific computing and for commercial computing. I/O is often a major bottleneck for such applications, both in cloud and HPC environments – especially when fast turnaround or integration of heavy computation and analysis are required.
Infrastructure evolution: HPC technology will not only be deployed in dedicated supercomputing centers in the future. “Embedded HPC”, “HPC in the box”, “HPC in the loop”, “HPC in the cloud”, “HPC as a service”, and “near- to-real-time simulation” are concepts requiring new small-scale deployment environments for HPC. A federation of systems and functions with consistent mechanisms for managing I/O, storage, and data processing across all participating systems will be required to create a “continuum” of computing.
Virtualization and disaggregation: As virtualization and disaggregation become broadly used in cloud and HPC computing, the issue of virtualized storage has increasing importance and efforts will be needed to understand its implications for performance.
Our goals in the ESSA Workshop are to bring together expert researchers and developers in data-related areas including storage, I/O, processing and analysis on extreme scale infrastructures including HPC systems, clouds, edge systems or hybrid combinations of those, to discuss advances and possible solutions to the new challenges we face.
Topics and Scope
Extreme-scale storage systems (on high-end HPC infrastructures, clouds, or hybrid combinations of them)
Extreme-scale parallel and distributed storage architectures
The synergy between different storage models (POSIX file system, object storage, key-value, row-oriented, and column-oriented databases)
Structures and interfaces for leveraging persistent solid-state storage and storage-class memory
High-performance I/O library and services
I/O performance in extreme-scale systems and applications (HPC/cloud/edge)
Storage and data processing architectures and systems for hybrid HPC/cloud/edge infrastructures, in support of complex workflows potentially combining simulation and analytics
Integrating computation into the memory and storage hierarchy to facilitate in-situ and in-transit data processing
I/O characterization and data processing techniques for application workloads relying on extreme-scale parallel/distributed machine-learning/deep learning
Tools and techniques for managing data movement among compute and data intensive components
Data reduction and compression
Failure and recovery of extreme-scale storage systems
Benchmarks and performance tools for extreme-scale I/O
Language and library support for data-centric computing
Storage virtualization and disaggregation
Ephemeral storage media and consistency optimizations
Storage architectures and systems for scalable stream-based processing
Study cases of I/O services and data processing architectures in support of various application domains (bioinformatics, scientific simulations, large observatories, experimental facilities, etc.)
ESSA 2024 Workshop Organization
Workshop Chairs
Chair: François Tessier, Inria, France
Co-Chair: Weikuan Yu, Florida State University, USA
Program Chairs
Chair: Sarah Neuwirth, Johannes Gutenberg University Mainz, Germany
Co-Chair: Arnab K. Paul, BITS Pilani, K K Birla Goa Campus, India
Web Chair
Chair: Lenny Guo, Pacific Northwest National Laboratory, Richmond, USA
Publicity Chair
Chair: Chen Wang, Lawrence Livermore National Laboratory, Livermore, USA
Important Dates
Please note: All deadlines are Anywhere on Earth
Paper submission deadline: January 25, 2024 February 15, 2024 (final extension)
Acceptance notification: February 15, 2024 March 1, 2024
Camera-ready deadline: February 29, 2024 March 15, 2024 (final extension)
Workshop date: May 27, 2024
Submission Link: https://ssl.linklings.net/conferences/ipdps/