In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a variety of storage solutions which are often incompatible with one another. How can we support a wide variety of conflicting I/O workloads under a single storage system?

We introduce the idea of a DataTask, a new data representation, and, we present TABIOS: a new, distributed, DataTask- based I/O system. TABIOS boosts I/O performance by up to 17x via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in-situ analytics via data provisioning. TABIOS demonstrates the effectiveness of storage consolidation to support the convergence of HPC and BigData workloads on a single platform.

  • Applications create I/O tasks, called DataTasks
  • A DataTask is practically a placeholder of an I/O job: {operation + pointer to data}
  • DataTasks are pushed in a distributed queue
  • Workers execute DataTasks independently


    • Adaptive to the environment
    • Fully decoupled architecture

Software Defined Storage (SDS):

    • Offloading computation to servers
    • Data-centric architecture


    • Power-cap I/O
    • Elastic I/O resources


    • Tunable I/O performance - Concurrency control
    • Guaranteed Storage QoS based on job size


    • POSIX, MPI-IO, HDF5, REST/Swift, Hadoop
    • Lustre, GPFS, HDFS, Hive, Object Stores


TABIOS Architecture


anatomy TABIOS Operations

More Coming soon...