Workshop on Data Storage Research 2025

An NSF-Sponsored Community Visioning Workshop

May 31 - June 1, 2018

The final workshop report is available on ACM DL here.

Selected results in a USENIX ;login article here.

The big data revolution along with the dawn of the age of Internet of Things (IoT) is driving the need for novel and innovative storage systems to store, manage, retrieve, and efficiently utilize unprecedented volumes of data at increasingly faster speeds. There are a number of open challenges and research issues that need to be addressed both in the short and long term to ensure sustained storage systems efficacy and performance. In particular, the wide variety of applications of modern and emerging storage systems entail that the fundamental design of storage systems should be revisited to support application-specific and application-defined semantics. Existing standards and abstractions need to be reevaluated, and new sustainable data representations need to be designed to support emerging applications. New storage software designs are also necessary to take advantage of the hardware advancements such as persistent memories in order to maximize efficiency and performance.

The goal of this invitation-only focused workshop is to bring together leading researchers in storage systems and distributed systems to provide a working vision, as well as prioritization for near- and long-term storage research and scientific investigations.

When: May 31 - June 1, 2018

Where: IBM Research - Almaden, San Jose, CA

                    • More information in the Venue tab

Organizers: Ali R. Butt, Virginia Tech

Vasily Tarasov, IBM Research

Ming Zhao, Arizona State University

The workshop is funded by the NSF CISE Division of Computer and Network Systems (CNS-1829096) to provide us with a platform for a consolidated effort to identify and establish a vision for storage systems research and comprehensive techniques that provide practical solutions to the storage issues facing the information technology community. Following are some key issues that will be covered:

  • How to evolve storage systems to meet the challenges of scale, throughput, and sustainability arising from emerging applications such as deep learning and Internet of Things (IoT)?
  • How to address allocation, management, privacy, performance, and multi-tenancy to meet the demands of the intense migration of data from on-premise to cloud deployments?
  • How to design file system APIs and higher-level yet simple-to-use interfaces for new storage systems?
  • How to design programming models to efficiently support innovative storage and deep storage hierarchies?
  • How to address pipeline issues and train the next generation of storage systems researchers?
  • How to design and make it easy for systems researchers to realize new applications and storage hardware?

Workshop participants will be charged with assessing the current state and direction of research related to storage systems while making recommendations for research related to the current, short-term, and long-term needs of designing sustainable and scalable storage architectures.

Agenda

Thursday, May 31, 2018:

  • 08:00 Registration and breakfast (Building Lobby and Auditorium Foyer)
  • 08:30 Introduction by workshop organizers (Auditorium)
  • 08:45 Welcome message by NSF CSR Lead Program Director Dr. Samee Khan (Auditorium)
  • 08:55 Overview of the NSF CSR Program by CSR Program Director Dr. Sandip Kundu (Auditorium)
  • 09:30 Keynote talk: "Lessons learned storing 4 PB of telemetry data and transforming it into insights" (Auditorium)
    • Shankar Pasupathy, Technical Director (Analytics), NetApp Inc.
    • NetApp’s Active IQ ecosystem gathers and analyses 70 billion data points a day from storage controllers across the world. These data points include logs, performance counters and configuration data. This telemetry data, combined with historical information from a 4 PB Hadoop Data Lake, is used to drive a variety of tools from customer support to data center optimization. In this talk, I will describe the storage challenges in acquiring, transforming, and applying machine learning to this telemetry data to produce insights for customers. I will also describe trends that we’ve observed from conversations with over fifty, Fortune 500 companies.
  • 10:30 Break (Auditorium Foyer)
  • 10:45 Keynote talk: "Conjectures towards fruitful directions in data storage 2025" (Auditorium)
    • Irfan Ahmad, Co-founder and CEO, CachePhysics
  • 11:45 Keynote talk: "Building a future proof enterprise storage in 2025" (Auditorium)
    • Lawrence Chiu, Head of Storage Research and Steven Hetzler, IBM Fellow, Cloud Data Architecture, IBM
    • Storage technology has never been more exciting. Many new emerging storage technologies bring new possibilities and use cases in enterprise and in data centers. We will discuss key storage technology trends including storage networking in next decades. We will also discuss the criteria and the challenges to build a future proof enterprise storage in hyperscale data center in 2025. Future storage infrastructure will push performance, scalability, resiliency, and manageability. We will discuss and demonstrate research prototypes addressing wide range of requirements.
    • Slides
  • 12:45 Lunch (Served out of J2-609 with seating in Cafeteria)
  • 01:40 Introduction of group discussion topics (Auditorium):
    1. AI and Storage: Made for each other (Room B2-425)
    2. Cloud, edge, and everything in between (Room H2-214)
    3. The hardware, they are a-changin (Room G2-210)
    4. Teaching old storage new tricks (Room J2-601)
  • 02:00 Breakout group discussions on the topics led by the group moderators
    • Four parallel groups (participants are assigned into groups based on their pre-workshop position papers)
    • Open discussion and identification of research challenges
    • Group lead and co-lead will shepherd the discussion
    • Participants will identify top 3 research challenges that may emerge
    • Prepare a set of slides for presentation to the other groups
    • Coffee will be served during these 3-hour session (Auditorium Foyer)
  • 5:00 - 6:30 Reception (Auditorium Foyer or Cafeteria Patio, depending on weather)

Friday, June 1, 2018:

  • 8:00 Breakfast (Auditorium Foyer)
  • 8:30 Summary presentations and report by each group (Auditorium)
    • One member, lead(s) or person nominated by the group, of each discussion group will give a presentation to all attendees and solicit feedback from all attendees
  • 10:00 Break (Auditorium Foyer)
  • 10:30 Breakout into groups to discuss further and start the writing of group reports
    • All groups work in parallel to start the writing of reports based on each group’s findings
  • 11:30 Feedback and Open mic (Auditorium)
    • Conference organizers solicit feedback about the workshop from all attendees and explain the next steps for completing the final workshop report
  • 12:00 Adjourn (Boxed lunches provided) (Auditorium Foyer)

Attendees & Discussion Groups

  1. AI and Storage: Made for each other (Room B2-425) [Discussion Slides]
    • Nisha Talagala, Parallel Machines (Discussion Lead) [Writeup]
    • Avani Waldani, Emory (co-Lead) [Writeup]
    • Yiran Chen, Duke [Writeup]
    • Yong Chen, TTU [Writeup]
    • Xubin He, Temple University [Writeup]
    • Min Li, IBM [Writeup]
    • Raju Rangaswami, Florida International University [Writeup]
    • Erez Zadok, Stony Brook University
  2. Cloud, edge, and everything in between (Room H2-214) [Discussion Slides]
    • Remzi H. Arpaci-Dusseau, University of Wisconsin-Madison (Discussion Lead) [Writeup]
    • Mai Zheng, New Mexico State University (co-Lead) [Writeup]
    • Irfan Ahmad, CachePhysics
    • Vijay Chidambaram, University of Texas at Austin [Writeup]
    • Angela Demke-Brown, University of Toronto [Writeup]
    • Sandip Kundu, National Science Foundation
    • Dilma Da Silva, Texas A&M University
    • Ali Saman Tosun, Univ. Texas at San Antonio [Writeup]
  3. The hardware, they are a-changin (Room G2-201) [Discussion Slides]
    • Peter Varman, Rice (Discussion Lead) [Writeup]
    • Yue Cheng, GMU (co-Lead) [Writeup]
    • Peter Desnoyers, Northeastern University [Writeup]
    • Song Jiang, UT Arlington [Writeup]
    • Ethan L. Miller, University of California Santa Cruz [Writeup]
    • Narasimha Reddy, Texas A&M University [Writeup]
    • David Rosenthal, Stanford Library
    • Michael L. Scott, U. Rochester [Writeup]
    • Yiying Zhang, Purdue University [Writeup]
  4. Teaching old storage new tricks (Room J2-601) [Discussion Slides]
    • Jason Flinn, University of Michigan (Discussion Lead) [Writeup]
    • George Amvrosiadis, CMU (co-Lead) [Writeup]
    • Feng Chen, Louisiana State University [Writeup]
    • Geoff Kuenning, Harvey Mudd College [Writeup]
    • Carlos Maltzahn, University of California Santa Cruz [Writeup]
    • Kathryn Mohror, LLNL [Writeup]
    • Sudharshan Vazhkudai, ORNL [Writeup]
    • Xiaodong Zhang, Ohio State University [Writeup]

Information for Participants

  • April 7, 2018: Deadline for position papers from submission-based invitees.
  • May 1, 2018: Deadline for position papers for all other invitees.

Lodging, travel, and other details are available on the Venue tab.

  • May 6, 2018: Please complete a quick pre-workshop survey needed to enable entry to the venue, and help us plan the meals.
  • May 22, 2018: Information about discussion groups & attendees is now available.
  • June 1, 2018: Email with links to workshop feedback form and reimbursement request form sent to participants.
  • June 1, 2018: Group leads share the (updated) discussion group slides with the organizers.
  • June 8, 2018: Complete workshop feedback form.
  • June 15, 2018: Complete reimbursement request process if seeking travel expense reimbursement.
  • June 21, 2018: Report writing Checkpoint 1 -- Discussion groups share an initial version of the group report with the organizers. Please limit to 6-8 pages.
  • July 6, 2018: Report writing Checkpoint 2 -- Distilled and streamlined Version 2 of the group reports is due. Group lead/co-lead begin working with the organizers to finalize the report. Other group members can also participate if available.
  • August 1, 2018: Report writing Checkpoint 3 -- Complete a final draft of the workshop report.

BOF Session@USENIX FAST 2019

  • Organizers: George Amvrosiadis, Ali R. Butt, Vasily Tarasov, Erez Zadok, Ming Zhao
  • When: Tuesday, February 26, 8:30 pm–9:30 pm
  • Where: Gardner Room B
  • Goal: To share the findings of the NSF Workshop on Data Storage Research 2025, and to solicit community feedback.
  • Presentation slides: Link
  • We had 40+ people attend. Thank you everyone for coming, and your valuable feedback.

Please direct all questions to Ali R. Butt at butta@cs.vt.edu