
Research Interests

My research focuses on the intersection of distributed systems and high-performance computing (HPC), exploring how multiple HPC facilities can collaborate through event-driven architectures to solve complex problems with enhanced resilience and efficiency. To achieve this, we are developing a hierarchical event fabric designed to advance both scientific applications—such as AI-guided simulation campaigns, time-sensitive data analysis pipelines, and distributed data integration—and resilience-enabling solutions, including a policy engine, resilient compute pools, and resilient data views. For more details on this DOE-funded Diaspora project, please refer to the project description and publications c17 and c18 in my CV.


Available on Google Scholar and in my CV.

Repositories, Artifacts, and Technical Reports


[FTXS'24] Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific Computing. pre-print
paper | project descriptions [1, 2] | diaspora SDK | diaspora service repo | docs and demos | SDK walkthrough | evaluation methodology

[NRDPISI-1] Diaspora: Resilience‑Enabling Services for Real‑Time Distributed Workflows.
paper | project descriptions [1, 2] | diaspora SDK | diaspora service repo | docs and demos

[eScience'24] An Empirical Investigation of Container Building Strategies and Warm Times to Reduce Cold Starts in Scientific Computing Serverless Functions. paper | Globus compute dataset | Binder dataset

[eScience'24] TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks. paper | repo | docs

[FGCS Vol. 153] The Globus Compute Dataset: An Open Function-as-a-Service Dataset From the Edge to the Cloud. paper | dataset


[OSDI'22] Cancellation in Systems: An Empirical Study of Task Cancellation Patterns and Failures. paper | poster | codebase | video

[ICC'22] Reliable Broadcast in Critical Applications: Asset Transfer and Smart Home. paper


[SOSP'21] Rabia: Simplifying State-Machine Replication Through Randomization. paper | poster | video | codebase | tech. report

[ICDCN'21] Practical Experience Report: Cassandra+: Trading-Off Consistency, Latency, and Fault-tolerance in Cassandra. paper | tech. report | codebases


[Computer Networks Vol.182] Reliable broadcast with trusted nodes: Energy reduction, resilience, and speed. paper | codebase

[GLOBECOM'20] BBB: A Lightweight Approach to Evaluate Private Blockchains in Clouds. paper | video | codebase

[NCA'20] CassandrEAS: Highly Available and Storage-Efficient Distributed Key-Value Store with Erasure Coding. paper | codebases

[Manuscript] Reliable Broadcast in Practical Networks: Algorithm and Evaluation. paper

[PerVehicle'20] Make Multi-hop Broadcast in VANET Fast by Selecting a Better Route for Source Vehicle. paper | slides | codebase

[DUCSAN'20] Tutorial: Google Cloud for Beginners: Architecture, Storage, and Computation. paper | video | slides | instruction

[DUCSAN'20] Tutorial: Deep Dive into Apache Cassandra: Theory, Design, and Application. paper | slides

[DUCSAN'20] LiteDoc: Make Collaborative Editing Fast, Scalable, and Robust. paper | codebases


[GLOBECOM'19] Reliable Broadcast in Networks with Trusted Nodes. paper | codebase

[PRDC'19] BBB: Make Benchmarking Blockchains Configurable and Extensible. paper | codebase

[NCA'19] Distributed Causal Memory in the Presence of Byzantine Servers. paper | audio | slides

[Sarnoff'19] A First Step Towards Production-Ready Network Function Storage: Benchmarking with NFSB. paper | codebase