Skyhook stores its data partitions in Ceph objects, and we leverage "programmable storage"1, 2, by developing extensions to Ceph object interfaces ("cls") with data processing, indexing, and repartitioning functions that can be performed directly within the storage system. SkyhookDM provides elasticity through both the storage system's inherent ability to dynamically add and remove data storage nodes in the cluster as well as the ability to dynamically repartition data directly within storage. Because our extensions enable storage to perform some processing, adding more nodes to the storage cluster provides increased processing power, enhancing both IO and Compute scalability.
Below illustrates the high-level architecture of SkyhookDM showing the Apache Arrow Dataset API interacting with CephFS to store our new RadosParquetFragment. The Dataset API applied to our RadosParquetFragment allows data processing operations to be "pushed down" directly into the storage system for execution. CephFS is used as a reliable, scalable way to define collections of "fragments", or row-partitions that comprise a table, and provides manage metadata as well as access control. It also enables hierarchies of fragments to be defined, for example to support different partitioning strategies. Moreover, partitions combined with metadata (partition statistics) enable the query planner to consider partition pruning.
Arrow Dataset API and Table Fragments
The Arrow Dataset API enables defining a dataset as a collection of fragments with identical schema, this is equivalent to row partitions of a table. SkyhookDM uses this API to store dataset fragments in objects. The Dataset API provides for a dataset discovery phase, for which our implementation walks the directory structure that defines all of the table partitions, i.e., fragments. The API also provides a serializable "expression" that can be applied to dataset fragments, where expression may define query operations such as SELECT or PROJECT.
While we use CephFS for scalable management of dataset fragments, the LIBRADOS direct object access library is used for performing reads in SkyhookDM. We develop custom 'CLS' methods in Ceph, where our methods including reading local object data then applying expressions. These expressions are applied to dataset fragments via the Arrow Dataset API, which filters data, then the result is returned to the dataset client.
An individual object may contain either dataset fragment(s) or fragment metadata as depicted in the above figure. Within an object, data may be stored within the chunkstore (blobs) or KV-store (RocksDB) provided by Ceph.