Leveraging programmable storage toward database elasticity

There exists a long line of research in database scalability, and current state of the art MPP database systems (massively parallel processing) have over 30 years of optimizations baked in, but these systems and optimizations do not necessarily transfer seamlessly to the cloud where elasticity is conveniently possible. Optimizations typically involve a tight integration of data processing with data storage or specific file formats and even file systems. Decoupling data processing from storage can free the system to grow and shrink more easily, but giving up a tight integration often means the database has less knowledge and hence less ability to perform the standard optimizations. Rather than creating a highly specialized proprietary file system, storage server, or specialized database, we are using open source software and taking a more generalizable approach that can be later customized for applications.

SkyhookDB takes the approach to use programmable storage toward elasticity as well as the traditional optimization of pushing the computation nearest to the data. By using an extensible object storage system, our goal is to not only push computation such as filters, but more complex computation such as user-defined functions that represent application specific processing, and additionally leverage programmability (composing and exposing storage services for use by applications) toward collection level object data management such as indexing, statistics, and batching among others.