UC Skyhook repos (original) Github | Docker
The Skyhook Data Management project was created and led by Jeff LeFevre (2016-2021) as an Open Source Fellow at the UC Center for Research in Open Source Software (CROSS). It was successfully merged into Apache Arrow in October 2021. Thanks to all who helped make this possible along the way including:
Key funding support and CROSS leadership: Carlos Maltzahn (CROSS Director), Stephanie Lieggi (CROSS Executive Director).
Grants: NSF TI-2229773, NSF OAC-1836650, DOE ASCR DE-NA0003525 (FWP 20-023266, subcontractor of Sandia National Labs), NSF CNS-1764102, Seagate Technologies, and the Center for Research in Open Source Software
Other funding support: Google Summer of Code, IRIS-HEP foundation, CERN-HSF.
Key contributors: Jeff LeFevre (Project Leader | UCSC), Noah Watkins (UCSC), Jayjeet Chakraborty (Lead Contributor | NIT India), Ivo Jimenez (UCSC), Ashay Shirwadkar (GSoC | UC Irvine), Aditi Gupta (GSoC | NIT India), Aldrin Montana (UCSC), Kathryn Dahlgren (UCSC), Xiongfeng Song (Rice), Yash Jipkate (GSoC | IIT India) .
Key advisors: Sage Weil (Ceph/Redhat), Doug Cutting (Open Source Leader), Philip Kufeldt (Seagate/NVIDIA).
Skyhook was merged into mainline Arrow 7.0 in 2021 and in 2025 was moved into its own Apache repository.
Contact: Jeff LeFevre (jlefevre@ucsc.edu)
Oct 2022. Publication: "Skyhook: Towards an Arrow-Native Storage System," in CCGrid 2022.
Sep 2022. Talk: (Carlos) "Birds of a Feather: Pathways to Enable an Open Source Ecosystem for the Skyhook Project", 2022 UC Santa Cruz Open Source Symposium.
Jul 2022. Release: v0.4.1.
Jan 2022. Announcement: On October 22, 2021, three reviewers formally approved the merge of the CROSS project SkyhoodDM into the Apache Arrow mainline to be included in the Apache Arrow 7.0.0 release. CROSS Announcement | Apache Announcement
Dec 2021. Publication: "Zero-Cost, Arrow-Enabled Data Interface for Apache Spark" in IEEE Big Data 2021.
Dec 2021. Release: v0.4.0.
Oct 2021. Announcement: CROSS Announcement for Skyhook Merge into Arrow.
Oct 2021. Announcement: SkyhookDM project has been merged into Apache Arrow 7.0.0 (to be released 3 February 2022)! Congratulations and a big thank you to Jayjeet Chakraborty and all who worked toward this goal. Please read Jayjeet's blog post for more information. CROSS Announcement | Github | Arrow Blog Post
Sep 2021. Publication: "Towards an Arrow native storage system" preprint available on arxiv.
Sep 2021. Talk: (Jayjeet) "SkyhookDM: an Arrow-native Storage System" at Storage Developers Conference (SDC) 2021.
Aug 2021. Jeff on Bereavement Leave for 6 months.
Jul 2021. Release: v0.3.0!
Jun 2021. Announcement: Congratulations and welcome to our latest OSRE and IRIS-HEP Fellow Eshan Bhargava! This project will work toward adding `pushback` functionality for rejecting offloaded tasks under certain conditions. Eshan will be co-mentored by Jianshen Liu.
May 2021. Release: v0.2.0.
May 2021. Announcement: Congratulations and welcome to our latest Google Summer of Code student Yash Jipkate! This project will work toward improving and updating our documentation to be on par with the latest Apache Arrow release.
Apr 2021. Announcement: Rados Remote Reads merged into Ceph mainline! Thanks to Ken Iizawa and Fujitsu. This PR enables direct object-to-object reads, and enables a rich set of future functionality for SkyhookDM.
Mar 2021. Announcement: First integration of SkyhookDM merged into Coffea analysis framework. Please see getting started and try the example notebook here.
Mar 2021. Release: 0.1.1
Feb 2021. Announcement: Please see our latest Guide for getting started with SkyhookDM.
Feb 2021. Release : 0.1.0
Feb 2021. Announcement: We posted a code walkthrough video of the new Arrow Datasets integrated code by Jayjeet. Initial Coffea package PR for Skyhook and getting started guide with test notebook here.
Feb 2021. Announcement: New Skyhook "Rados Dataset" for Arrow repository/branch - please see readme for getting started using Docker to try it out!
Jan 2021. Announcement: Jayjeet awarded an IRIS-HEP fellowship through June 2022, congratulations Jayjeet. Jayjeet will continue his work on the new integration of Skyhook with the Arrow Dataset API and a Coffea analysis package for Skyhook. Congratulations Jayjeet!
Oct 2020. Talk: (Aditi Gupta | NIT Karnataka ), "Extend SkyhookDM programmable object storage with statistics, sort/aggregate and data compaction functions" at the CROSS Research Symposium.
Oct 2020. Talk: (Xiongfeng Song | Rice University), "SkyhookDM projection-only pushdown and Arrow dataset integration into Skyhook objects" at the CROSS Research Symposium.
Oct 2020. Talk: (Matthew Rhea | UCSC), "Python SQL client interface for SkyhookDM" at the CROSS Research Symposium.
Oct 2020. Talk: (Jeff), "Storage and management of tabular data in object storage with SkyhookDM" at the CROSS Research Symposium.
Oct 2020. Talk: (Jayjeet | NIT Durgapur), "Reproducible large-scale SkyhookDM experiments using Popper" at the CROSS Research Symposium.
Oct 2020. Talk: (Saloni Rane, Amazon), "Extending Ceph objects to support webassembly executables" at the CROSS Research Symposium.
Sep 2020. Talk: (Jeff) SkyhookDM was presented at SDC 2020 in the Computational Storage session. Slides.
Sep 2020. Announcement: IRIS-HEP fellow Jayjeet Chakraborty created a post about his work on deploying and benchmarking SkyhookDM, using Popper for reproducibilty.
Sep 2020. Talk: (Jeff) SkyhookDM invited talk to Kioxia.
Aug 2020. Announcement: GSoC Fellow Aditi Gupta successfully completed her summer internship with CROSS through CERN-HSF and IRIS-HEP orgs. Check out her final report here.
Jul 2020. Talk: (Carlos) Linux Professional Institute interviews CROSS Director Carlos Maltzahn about the value of open source at universities.
Jul 2020. Announcement: Congratulations and welcome to our new IRIS-HEP fellow Xiongfeng Song! A bit about Xiongfeng and his project.
Jun 2020. Announcement: Skyhook upgraded to Support for Ceph Nautilus version.
Jun 2020. Announcement: Congratulations and welcome to our new IRIS-HEP fellow Jayjeet Chakraborty! A bit about Jayjeet and his project.
Jun 2020. Announcement: New - Ask questions on StackOverflow with the [skyhook-ceph] tag! We also have a new GitHub Issues link, please post any potential bugs there.
Jun 2020. Announcement: We are in the progress of creating new documentation and tutorials from our previous Wiki, creating a lighter-weight repo, and updating to support Ceph Nautilus version. Thanks very much to Ivo Jimenez for helping with this. Ivo leads the Popper project for reproducibility at CROSS, please check it out.
May 2020. Publication: "SkyhookDM: Data Processing in Ceph with Programmable Storage" in USENIX ;login: Magazine..
May 2020. Announcement: Congratulations and welcome to our GSoC 2020 student Aditi Gupta! A bit about Aditi and her project.
May 2020. Announcement: SkyhookDM was selected for Google Summer of Code (GSoC) 2020, as part of CERN-HSF and IRIS-HEP organizations. project description
Mar 2020. Announcement: SkyhookDM Google Summer of Code projects now posted! We are happy to participate in GSoC for a second year in a row, this year in collaboration with the IRIS-HEP software institute. Please take a look and chat with us on our gitter channel (@jlefevre).
Feb 2020. Talk: (Jeff) "Scaling Databases and File APIs with Programmable Ceph Object Storage" at Vault'20. Presentation and pdf here.
Dec 2019. Announcement: SkyhookDM to appear at Vault'20! "Scaling Databases and File APIs with Programmable Ceph Object Storage".
Nov 2019. Publication: "Towards Physical Design Management in Storage Systems" in Supercomputing 2019 (SC'19), Denver, CO.
Nov 2019. Talk: (Jeff) SkyhookDM presented at 24th International Conference on Computing in High Energy and Nuclear Physics in Adelaide, Australia. "Mapping Scientific Datasets to Programmable Object Storage".
Oct 2019. Talk: (Jeff) SkyhookDM presented at the National Diversity in STEM Conference (SACNAS) in Honolulu, Hawai'i. "Helping Scientists Fly Over Data without Getting Swamped".
Oct 2019. Talk: (Ashay) Presents his GSoC project at the 4th Annual CROSS Symposium in Santa Cruz, CA.
Oct 2019. Talk: (Jeff) Presents SkyhookDM at the 4th Annual CROSS Symposium in Santa Cruz, CA.
Aug 2019. Announcement: GSoC project successfully completed, please checkout the report here. Ashay embedded column-oriented data formats and processing with Apache Arrow within our SkyhookDM Ceph extensions. Thank you for a great project Ashay.
May 2019. Announcement: Skyhook welcomes our first GSoC student, Ashay Shirwadkar!
Mar 2019. Announcement: Skyhook has potential projects posted for Google Summer of Code 2019! CROSS was accepted again this year as a GSoC mentor organization and has other projects posted as well, please take a look. And see our Gitter channel to interact directly with mentors. Applications are open until April 9, 2019. More info here.
Feb 2019. Talk: (Jeff) Skyhook presented at Vault'19! The Linux Storage and File Systems Conference (co-located with FAST'19) in Boston, MA. (pdf slides)
Oct 2018. Talk: (Jeff) Skyhook presented at 2 sessions of CROSS symposium.
Sep 2018. Talk: (Jeff) Skyhook invited talk to Huawei analytic database group.
Aug 2018. Talk: (Jeff) Skyhook invited talk to Huawei storage group.
Oct 2017. Announcement: Skyhook incubator project funding is renewed by the CROSS Industry Advisory Board.
"Dear Jeff, Congratulations! The CROSS UCSC Committee has accepted the recommendation from the IAB to continue to fund your incubator "Skyhook: Elastic Databases for the cloud.”"
Sep 2017. Talk: (Jeff) SkyhookDB presented at lightning talk session of PostgresOpen 2017.
Mar 22, 2017. Announcement: CROSS awards continued funding for "Skyhook: Elastic Databases for the cloud.”
Dear Jeff, Congratulations! The CROSS UCSC Committee has accepted the recommendation from the IAB to continue to fund your incubator "Skyhook: Elastic Databases for the cloud.”
Oct 4, 2016. Announcement: Skyhook is awarded initial funding. | Github
"Dear Jeff, Congratulations! Your incubator project proposal “Elastic Databases for the Cloud” is selected for funding."
Apr 9, 2016. Announcement: Conditional decision regarding Skyhook initial proposal from UC Santa Cruz Open Source center CROSS and its Industrial Advisory Board.
"Dear Jeff, We are pleased to let you know that your incubator project proposal “Elastic Databases for the Cloud” is *conditionally* selected for funding."