Skyhook

Data Management

SkyhookDM - Programmable storage for databases.

Please see our Announcements page for latest news (last updated July 1, 2020).

Skyhook is an open source project within the Center for Research on Open Source Software at the University of California Santa Cruz. Skyhook leverages "programmable storage" capabilities to enhance data management directly within the storage layer of a distributed object storage system such as Ceph. Please see our new GitHub repository to get started.

Goals

The goal of Skyhook is to allow users to transparently grow and shrink their data storage and processing needs as demands change. Skyhook utilizes and extends Ceph distributed object storage with customized C++ "object classes" that enable database operations such as SELECT, PROJECT, AGGREGATE to be offloaded (i.e., pushed down) directly into the object storage layer. We are developing custom user-defined functions (UDFs) to enable domain-specific processing as well. SkyhookDM also enables data management tasks to be executed directly within storage such as local indexing and data redistribution or reformatting (row/col) to support dynamic data management in the cloud. These tasks operate directly on objects at the single object or cross-object level.

Row and Columnar processing

SkyhookDM now supports row-based processing via Google Flatbuffers format and col-based processing via Apache Arrow fast in-memory serialization formats. For more details please see our architecture overview.

Client layer applications

Skyhook in-storage functionality can be accessed by higher level client software, such as PostgreSQL database using the Postgres external table access interface (foreign data wrapper), our custom Python interface (pyarrow and dataframes) for high-energy physics data, and our Python SQL client interface (in-progress).

Contributing

Skyhook is in development phase with a rich set of features to work on. To help with the project, please Get Involved, take a look at our current GSoC projects, or jump directly to installing development version.

Deploying and Using SkyhookDM

There are several ways to deploy SkyhookDM with Ceph. Please see our wiki for current deployment options.