NSF CICI: Confidential Computing in Reproducible Collaborative Workflows

Summary

Data-intensive scientific research projects often involve multiple collaborative parties. Some parties may demand confidential processing of their sensitive assets to protect intellectual property, embargo data (or algorithm) sharing before publishing a paper, conform to legal requirements, or avoid the responsibility for releasing sensitive data. However, integrating confidential computing into scientific workflows raises significant challenges. (1) Most science domain developers find it challenging to learn specific confidential computing frameworks and secure their code to protect from side-channel attacks. (2) The interplay between the private components and other components in a collaborative workflow may enable new attacks and side channels for adversaries to explore. The proposed project aims to address these challenges with a scientist-friendly development framework for confidential computing and a holistic attack study and mitigation framework for collaborative workflows. The success of this project will enable domain scientist developers to adopt the best confidential computing practices easily and use publicly available resources without the concern of confidentiality and privacy breach, boosting the idea of open, collaborative science.

Specifically, the proposed research focuses on the scientist-oriented trusted-execution-environment (TEE) based development and studies its integration with collaborative scientific workflows. (1) The project explores different protection and usability solutions for domain scientists and allows them to trade off between their research goals and security and privacy concerns. (2) It develops an efficient and transparent TEE access-pattern protection framework that uniquely combines the best practices in data-intensive computing and framework-based mitigation methods. (3) It takes a holistic approach to studying new security and privacy threats around confidential components in a collaborative workflow, covering stages including task execution, logging, provenance analysis, and reproducibility verification. The solutions will integrate techniques like TEE, blockchain, and differential privacy. (4) It is science-driven, motivated, and validated by collaborative research projects in biomedical sequence processing, image-based remote diagnosis, and healthcare data analytics. This project will generate open-source toolkits and demonstration systems. It also includes several educational and outreach initiatives to enhance cybersecurity and data science programs, attract underrepresented students, help local high school CS education, and strengthen industrial collaborations.

https://www.nsf.gov/awardsearch/showAward?AWD_ID=2232824

Personnel:

Keke Chen (Marquette University, PI)

Zeno Franco (MCW, co-PI)

Zeyun Yu (UWM, co-PI)
Ning Jiang (UPenn, Consultant)

Involved graduate students: Mubashiwar Alam, Mrinal Kanti Dhar (MCW), Yuechun (Ethan) Gu

Prototypes: SGX-MR, Attacks on Private Models