Summary
Data-intensive scientific research projects often involve multiple collaborative parties. Some parties may demand confidential processing of their sensitive assets to protect intellectual property, embargo data (or algorithm) sharing before publishing a paper, conform to legal requirements, or avoid the responsibility for releasing sensitive data. However, integrating confidential computing into scientific workflows raises significant challenges. (1) Most science domain developers find it challenging to learn specific confidential computing frameworks and secure their code to protect from side-channel attacks. (2) The interplay between the private components and other components in a collaborative workflow may enable new attacks and side channels for adversaries to explore. The proposed project aims to address these challenges with a scientist-friendly development framework for confidential computing and a holistic attack study and mitigation framework for collaborative workflows. The success of this project will enable domain scientist developers to adopt the best confidential computing practices easily and use publicly available resources without the concern of confidentiality and privacy breach, boosting the idea of open, collaborative science.
Specifically, the proposed research focuses on the scientist-oriented trusted-execution-environment (TEE) based development and studies its integration with collaborative scientific workflows. (1) The project explores different protection and usability solutions for domain scientists and allows them to trade off between their research goals and security and privacy concerns. (2) It develops an efficient and transparent TEE access-pattern protection framework that uniquely combines the best practices in data-intensive computing and framework-based mitigation methods. (3) It takes a holistic approach to studying new security and privacy threats around confidential components in a collaborative workflow, covering stages including task execution, logging, provenance analysis, and reproducibility verification. The solutions will integrate techniques like TEE, blockchain, and differential privacy. (4) It is science-driven, motivated, and validated by collaborative research projects in biomedical sequence processing, image-based remote diagnosis, and healthcare data analytics. This project will generate open-source toolkits and demonstration systems. It also includes several educational and outreach initiatives to enhance cybersecurity and data science programs, attract underrepresented students, help local high school CS education, and strengthen industrial collaborations.
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2232824
Personnel:
Keke Chen (Marquette University, PI)
Zeno Franco (MCW, co-PI)
Zeyun Yu (UWM, co-PI)
Ning Jiang (UPenn, Consultant)
Involved graduate students: Mubashiwar Alam, Mrinal Kanti Dhar (MCW), Yuechun (Ethan) Gu
Prototypes: SGX-MR, Attacks on Private Models
Related papers:
Yuechun Gu, Jiajie He, and Keke Chen, "Adaptive Domain Inference Attack with Concept Hierarchy", in Proceedings of ACM SIGKDD conference, 2025
Mubashwir Alam and Keke Chen, "TEE-Graph: efficient privacy and ownership protection for cloud-based graph spectral analysis", in Frontiers in Big Data, 2023
Mubashwir Alam and Keke Chen, "Making Your Program Oblivious: a Comparative Study for Side-channel-safe Confidential Computing", IEEE Conference on Cloud Computing (CLOUD), Chicago, 2023
Mubashwir Alam, Justin Boyce, and Keke Chen, "Demo: SGX-MR-Prot: Efficient and Developer-Friendly Access-Pattern Protection in Trusted Execution Environments, IEEE Conference on Distributed Computing Systems (ICDCS), Hong Kong, China, 2023, arxiv
Yuechun Gu and Keke Chen, "GAN-Based Domain Inference Attack," in AAAI 2023. arxiv
Keke Chen, "Confidential High-Performance Computing in the Public Cloud," IEEE Internet Computing, accepted in 2022, arxiv