HPS 2021 : 2nd Workshop on High-Performance Storage

Held in conjunction with IEEE IPDPS 2021 - May 21st, 2021

SCHEDULE (TIME: PST)

08:00 - 08:10: Welcome message from the chairs

Gabriel Antoniu, Inria (Workshop Chair), Marc Snir, UIUC (Workshop Co-Chair)

Bogdan Nicolae, ANL (Program Chair), Osamu Tatebe, U Tsukuba (Program Co-Chair)

Keynote

08:10 - 09:00: Designing High-Performance Storage for a World after Hard Drives

Glenn K. Lockwood, NERSC

Paper and Invited Talk Session

09:00 - 09:30: Invited talk: The Storage System of the Fugaku Supercomputer

Takuya Okamoto, Fujitsu

09:30 - 10:00: Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows

Zhe Wang, Pradeep Subedi, Matthieu Dorier, Philip E. Davis, Manish Parashar

10:00 - 10:30: Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference task

Ke Fan, Kristopher Micinski, Thomas Gilray, Sidharth Kumar

10:30 - 11:00: Invited talk: Meaningful Measurements? IO500’s 5th Year’s Search for Meaning

Jay Lofstead, Sandia National Laboratories

Reminder about IPDPS 2021 registration: All papers in the workshop must have one author registered at full (non-student) fee. It is assumed that all attendees to the workshop will be registered with IPDPS 2021 to obtain access to the proceedings and to all live events and recorded sessions of the conference.

Keynote: Designing High-Performance Storage for a World after Hard Drives

Glenn K. Lockwood

National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory


Abstract: We will discuss the architecture of the Perlmutter file system and the quantitative approach NERSC used to ensure that this all-flash file system would provide the best balance of capacity, performance, endurance, and stability for NERSC's 8,000 users. We will also discuss unresolved challenges in designing extreme-scale all-flash storage systems, then conclude with several promising future directions in storage systems design that NERSC will be pursuing over the next five years.


Bio: Glenn K. Lockwood is the principal storage architect at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory where he leads future storage systems design, I/O performance engineering, and many storage R&D activities across the center. He was a lead designer of the 35 PB all-NVMe Perlmutter file system, and he also played a key role in defining NERSC's Storage 2020 vision which culminated in the deployment of its 128 PB Community File System. In addition to storage systems design, Glenn is also actively engaged in the parallel I/O community; he represents NERSC on the HPSS Executive Committee, is a maintainer of the IOR and mdtest community benchmarks, and is a contributor to the Darshan I/O profiling library. Glenn holds a Ph.D. in materials science and a B.S. in ceramic engineering from Rutgers University.

Invited Talk: The Storage System of the Fugaku Supercomputer

Takuya Okamoto, Fujitsu Ltd.

Abstract: The Fugaku supercomputer is currently the world's fastest supercomputer, developed by RIKEN and Fujitsu. To provide a large-capacity and high-performance storage system, the Fugaku adopts 3-level hierarchical storage system: the 1st layer that serves as a dedicated high-performance filesystem for each job execution, the 2nd layer that provides large-capacity shared filesystems used by users and jobs, and the 3rd layer that provides commercial cloud storage. The 2nd layer storage adopts Fujitsu Exabyte File System (FEFS), a Lustre based filesystem developed during the K computer development. For the 1st layer storage, we have developed a new filesystem called Lightweight Layered IO-Accelerator (LLIO). LLIO provides 3 types of area to the jobs: a transparent cache of the 2nd layer filesystem, and 2 temporary filesystems. LLIO also provides an efficient file copying command to relieve hotspots which often lead to performance bottleneck when large scale jobs read shared input files. This talk presents an overview of the Fugaku storage system, LLIO functionalities, and their performance.

Bio: Takuya Okamoto received his B.E., M.E. degrees from The University of Tokyo in 2014, 2016, respectively. Afterwards he joined Fujitsu Ltd. He has spent 5 years as developer for Lightweight Layered IO-Accelerator (LLIO).

Invited Talk: Meaningful Measurements? IO500’s 5th Year’s Search for Meaning

Jay Lofstead, Sandia National Laboratories

Abstract: The IO500 was created as a way to encourage users to submit information about their data center, particularly, their storage systems, by providing a competition for bragging rights. The initial workloads were justified well at the time and the 10 Node Challenge was added to increase participation. With things seemingly moving along, reflection on the existing benchmarks and how to represent new workloads, and if they are relevant, has caused considerable discussion. This talk will examine the roots and motivations for the IO500 benchmark suite and what challenges have been revealed in meaningful measurements that are useful beyond a competition.


Bio: Dr. Jay Lofstead is a Principal Member of Technical Staff in the Scalable System Software department of the Center for Computing Research at Sandia National Laboratories in Albuquerque, NM. His work focuses on infrastructure to support all varieties of simulation, scientific, and engineering workflows with a strong emphasis on IO, middleware, storage, transactions, operating system features to support workflows, containers, software engineering and reproducibility. He is co-founder of the IO-500 storage list. He also works extensively to support various student mentoring and diversity programs at several venues each year including outreach to both high school and college students. Jay graduated with a BS, MS, and PhD in Computer Science from Georgia Institute of Technology and was a recipient of a 2013 R&D 100 award for his work on the ADIOS IO library.

Workshop Overview

Advances in storage are becoming increasingly critical because workloads on high performance computing (HPC) and cloud systems are producing and consuming more data than ever before, and the situation promises to only increase in future years. Additionally, the last decades have seen relatively few changes in the structure of parallel file systems, and limited interaction between the evolution of parallel file systems, e.g., Lustre, GPFS, and I/O support systems that take advantage of hierarchical storage layers, e.g., node local burst buffers. However, recently the community has seen a large uptick in innovations in storage systems and I/O support software for several reasons:

  • Technology: The availability of an increasing number of persistent solid-state storage technologies that can replace either memory or disk are creating new opportunities for the structure of storage systems.

  • Performance requirements: Disk-based parallel file systems cannot satisfy the performance needs of high-end systems. However, it is not clear how solid-state storage can best be used to achieve the needed performance, so new approaches for using solid-state storage in HPC systems are being designed and evaluated.

  • Application evolution: Data analysis applications, including graph analytics and machine learning, are becoming increasingly important both for scientific computing and for commercial computing. I/O is often a major bottleneck for such applications, both in cloud and HPC environments – especially when fast turnaround or integration of heavy computation and analysis are required.

  • Infrastructure evolution. HPC technology will not only be deployed in dedicated supercomputing centers in the future. “Embedded HPC”, “HPC in the box”, “HPC in the loop”, “HPC in the cloud”, “HPC as a service”, and “near- to-real-time simulation” are concepts requiring new small-scale deployment environments for HPC. A federation of systems and functions with consistent mechanisms for managing I/O, storage, and data processing across all participating systems will be required to create a “continuum” of computing.

  • Virtualization and disaggregation: As virtualization and disaggregation become broadly used in cloud and HPC computing, the issue of virtualized storage has increasing importance and efforts will be needed to understand its implications for performance.

Our goals in the HPS Workshop are to bring together expert researchers and developers in storage and I/O from across HPC and cloud computing to discuss advances and possible solutions to the new challenges we face.

Topics

• High-end storage systems

• Parallel and distributed high-end storage architectures and techniques

• The synergy between different storage models (POSIX file system, object storage, key-value, row-oriented, and column-oriented databases)

• Storage and data processing architectures and systems for hybrid HPC/cloud/edge infrastructures, in support of complex workflows potentially combining simulation and analytics

• Structures and interfaces for leveraging persistent solid-state storage

• High-performance I/O libraries and services

• I/O performance in high-end systems and applications

• Data reduction and compression

Failure models and recovery of high end storage systems.

• Benchmarks and performance tools for high-end I/O

• Language and library support for data-centric computing

• Storage virtualization and disaggregation

• Active processing in storage technologies.

• Ephemeral storage media and consistency optimizations

• Storage architectures and systems for scalable stream-based processing

• Study cases of I/O services in support of various application domains (bioinformatics, scientific simulations, large observatories, experimental facilities, etc.)

SubmissioN

Important Dates:

  • Abstract submission (optional) deadline: January 25th, 2021

  • Paper submission deadline: January 31st, 2021 February 21st, 2021 (Extended, Hard Deadline)

  • Acceptance notification: March 7th, 2021

  • Camera-ready deadline: March 20th, 2021

  • Workshop date: May 21st, 2021

Submission link:

https://ssl.linklings.net/conferences/ipdps/?page=Submit&id=HPSWorkshopFullSubmission&site=ipdps2021

Paper Categories:

Authors are invited to submit:

    • Full 8-page papers

    • Short/work-in-progress 4-page papers

Formatting:

Single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. The submitted manuscripts should include author names and affiliations. The IEEE conference style templates for MS Word and LaTeX provided by IEEE eXpress Conference Publishing are available here: https://www.ieee.org/conferences/publishing/templates.html All papers must be in English. We use single-blind reviewing process, so please keep the authors names, publications, etc., in the text.

Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the IEEE Digital Library.

Chairs

Workshop Chairs

Chair: Gabriel Antoniu, Inria, France gabriel.antoniu@inria.fr

Co-Chair: Marc Snir, University of Illinois at Urbana-Champaign, USA - snir@illinois.edu


Program Chairs

Chair: Bogdan Nicolae, Argonne National Lab, USA - bogdan.nicolae@acm.org

Co-Chair: Osamu Tatebe, University of Tsukuba, Japan - tatebe@cs.tsukuba.ac.jp

Program Committee

Angelos Bilas, Forth, Greece

Suren Byna, LLBL, USA

Franck Cappello, ANL, USA

Jesus Carretero, Universidad Carlos III de Madrid, Spain

Toni Cortes, Barcelona Supercomputing Center, Spain

Alexandru Costan, Inria and INSA Rennes, France

Matthieu Dorier, Argonne National Lab, USA

Kathryn Mohror, Lawrence Livermore National Laboratory, USA

Dana Petcu, University West Timisoara, Romania

Michael Schoettner, University of Dusseldorf, Germany

Domenico Talia, University of Calabria, Italy

Kento Sato, RIKEN, Japan

François Tessier, Inria, France

Weikuan Yu, Florida State University, USA

Steering Committee

Gabriel Antoniu , Inria, Rennes, France


Franck Cappello, Argonne National Laboratory, USA


Toni Cortés, Barcelona Supercomputing Center, Spain


Kathryn Mohror, Lawrence Livermore National Laboratory, USA


Kento Sato, RIKEN, Japan


Marc Snir, University of Illinois at Urbana-Champaign, USA


Weikuan Yu, Florida State University, USA