I am an Assistant Professor in the School of Computer Science at McGill University, where I lead the Data-Intensive Storage and Computer Systems Laboratory (DISCS Lab). My current focus is on storage and persistent memory technologies, with an emphasis on the way we manage large-scale data for machine learning, data science and edge computing workloads. I also hold a status-only appointment at the University of Toronto.
I completed my PhD in Computer Science at the University of Sydney in 2020, advised by Prof. Willy Zwaenepoel. My dissertation research was on the design and implementation of efficient key-value stores for future hardware and performance requirements. My PhD was generously supported by The University of Sydney Faculty of Engineering and IT Dean's Postgraduate Research Scholarship and by the EPFL Fellowship for Doctoral Studies. I have earned my Bachelors and Masters degrees in Computer Science from EPFL.
Contact: oana.balmau@mcgill.ca
Office: McConnell 113N
Interested in working together? I have one open position for a PhD student!
Please read this before sending me email.I am lucky to work with many amazing students and collaborators. Meet the team and have a look at my group's ongoing research here.
Data powers everything we do and we are collecting it at unprecedented rates. The driver for research at DISCS is to create a storage infrastructure that enables us to gain insights from this data in a fast and energy-conscious manner.
Our two main research directions are:
Systems for ML. Our main contribution in this space is the MLPerf Storage benchmark.
Edge computing. In this space, we are designing new frameworks for fast and secure edge computing in the hierarchical edge. This work is a part of a DND IDEaS micro-net, in collaboration with colleagues at the University of Toronto, UBC, and ETS Montréal.
See more details on the DISCS page.
Previous Research: Key-value Stores
Key-value stores (KVs) are a crucial component in cloud computing because they can efficiently handle large-scale, diverse data (e.g., deployed in the infrastructure of Google, Apple, Facebook, and Amazon). In my dissertation, Redesigning Persistent Key-Value Stores for Future Workloads, Hardware, and Performance Requirements, I proposed new techniques to improve persistent KVs. I designed and built four novel open-source systems: TRIAD, FloDB, SILK, and KVell. For an overview, have a look at my job talk.
KVell+ [OSDI '20] addresses space amplification for queries executed under Snapshot Isolation in KVs. Frequent updates during long-running analytics queries create significant space amplification, and resulting garbage collection gives rise to latency spikes for shorter transactions. We introduce a new model for processing analytics queries based on the observation that such queries consist in large part of commutative processing of data items resulting from range-scans, in which each item in the range is read exactly once. OLCP incurs little or no space amplification or garbage collection overhead. [pdf] [code][slides][talk-OSDI by Baptiste Lepers]
KVell [SOSP '19] provides surprising insights into new storage technologies and their impact on current persistent KV designs. The emergence of fast drives shifts the bottleneck from I/O bandwidth to the CPU, making it necessary to revisit previous fundamental design assumptions, such as maintaining the sorted order of data and making use of complex synchronization primitives. [pdf] [code][slides][talk-SOSP by Baptiste Lepers]
SILK+ [TOCS '20] builds upon the SILK I/O scheduler, adding support for workloads with heterogeneous item sizes and analytics queries (i.e., range scans). SILK+ is an important addition for production workloads such as the ones at Nutanix, Pinterest, and Wikipedia, where the item sizes can differ by up to three orders of magnitude. Academic Impact: This work was an invited paper for the Special Issue of TOCS '20. [pdf]
SILK [ATC '19] addresses the issue of tail latency in log-structured merge KVs, stemming from significant interference between client work and KV maintenance operations. The interference creates a bottleneck at the I/O bandwidth level. SILK prevents tail latency spikes through a novel opportunistic I/O bandwidth scheduling mechanism. Academic Impact: This work received one of three Best Paper Awards in USENIX ATC '19 (top 3 out of 356 submissions). [pdf] [code][slides][talk-MSR]
TRIAD [ATC '17] focuses on the disk utilization of KVs. Through its three complementary techniques acting at the memory, disk and commit log levels, TRIAD drastically reduces write amplification in persistent storage and the effect of KV maintenance operations. The reduced write amplification leads to a commensurate throughput improvement for the client-facing workload. Industry Impact: This work is currently used in production at Nutanix and was featured on Mark Callaghan's Small Datum blog. [pdf] [code][slides]
FloDB [ATC '17] addresses the issue of scalability with the memory size and with the number of threads in persistent KVs, again resulting in important gains in throughput for client workloads. The main contribution is a new two-layer data-structure design which is highly concurrent and improves the data flow from clients, to memory, to disk. [pdf] [code][slides]
Publications
[Middleware '24] vPIM: Processing-in-Memory Virtualization. ACM/IFIP International Middleware Conference. D. Teguia, J. Chen, S. Bitchebe, O. Balmau, A. Tchana. [pdf - coming soon] [slides - coming soon]
[SEC '24] Falcon: Live Reconfiguration for Stateful Stream Processing on the Edge. ACM/IEEE Symposium on Edge Computing. P. Mishra, N. Bore, B. Ramprasad, M. Thiessen, M. Gabel, A. da Silva Veith, O. Balmau, E. de Lara. Best Paper Award! [pdf ] [talk][code]
[OSR ‘24] Is Bare-metal I/O Performance with User-defined Storage Drives Inside VMs Possible? ACM SIGOPS Operating Systems Review (OSR). S. Rolon, O. Balmau. Invited paper. [pdf][slides]
[EuroMLSys '24] SpeedyLoader: Efficient Pipelining of Data Preprocessing and Machine Learning Training ACM Workshop on Machine Learning and Systems (collocated with EuroSys). R. Nouaji, S. Bitchebe, O. Balmau. [pdf][slides]
[EdgeSys '24] Stream Processing with Adaptive Edge-Enhanced Confidential Computing ACM International Workshop on Edge Systems, Analytics, and Networking (collocated with EuroSys). Y. Yan, P. Mishra, W. Huang, A. Mehta, O. Balmau, D. Lie. [pdf][slides]
[EdgeSys '24] PathFS: A File System for the Hierarchical Edge ACM International Workshop on Edge Systems, Analytics, and Networking (collocated with EuroSys). V. Dantas de Lima Melo, M. Thiessen, A. Panas, A. da Silva Veith, K. Yano, O. Balmau, E. de Lara [pdf][slides]
[EuroSys '24 - Poster] PAM: Fast Reactive Reconfiguration for Stateful Stream Processing ACM International Workshop on Edge Systems, Analytics, and Networking. P. Mishra, O. Balmau, E. de Lara [pdf]
[CHEOPS '23] Is Bare-metal I/O Performance with User-defined Storage Drives Inside VMs Possible? ACM Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems. S. Rolon, O. Balmau. Top paper, invited to publish in ACM SIGOPS OSR. [pdf][slides]
[SIGMOD Record '22] Characterizing I/O in Machine Learning with MLPerf Storage. SIGMOD Record DBrainstorming. O. Balmau. [pdf] [code]
[SEC '22] Shepherd: Seamless Stream Processing on the Edge. AACM/IEEE Symposium on Edge Computing. B. Ramprasad, P. Mishra, M. Thiessen, H. Chen, A. da Silva Veith, M. Gabel, O. Balmau, A. Chow, E. de Lara. [pdf] [talk]
[CHEOPS '22] TONE: Cutting Tail-Latency in Learned Indexes. ACM Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems. Y. Zhang, X. Xiong, O. Balmau. [pdf] [code] [talk]
[OSDI '20] Snapshot Isolation Without Snapshots. USENIX Symposium on Operating Systems Design and Implementation 2020 (17% acceptance ratio). B. Lepers, O. Balmau, K. Gupta , W. Zwaenepoel. [pdf] [code] [slides]
[THESIS] Redesigning Persistent Key-Value Stores for Future Workloads, Hardware, and Performance Requirements. Oana Balmau. Doctoral Dissertation, The University of Sydney, 2020. Advised by Prof. Willy Zwaenepoel. PhD Committee: Dr. Ricardo Bianchini, Prof. Vijay Chidambaram, Prof. Frans Kaashoek. Winner of CORE John Makepeace Bennett Award 2021 for the best Computer Science doctoral dissertation in Australia and New Zealand. Honorable Mention for the ACM SIGOPS Dennis M. Ritchie Doctoral Dissertation Award [pdf].
[TOCS '20] SILK+: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads. ACM Transactions on Computer Systems Special Issue. O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta , R. Chandhiramoorthi, D. Didona. Invited paper. [pdf]
[SOSP '19] KVell: the Design and Implementation of a Fast Persistent Key-Value Store. Symposium on Operating Systems Principles 2019 (14% acceptance ratio). B. Lepers, O. Balmau, K. Gupta , W. Zwaenepoel. [pdf] [code][slides][talk]
[USENIX ATC '19] SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. USENIX Annual Technical Conference 2019 (19% acceptance ratio). O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta , R. Chandhiramoorthi, D. Didona. Best Paper Award! Invited to publish in ACM Transactions on Computer Systems (TOCS) special issue. [pdf] [code][slides][talk-MSR]
[NETYS '19] The Fake News Vaccine. The International Conference on Networked Systems 2019. O. Balmau, R. Guerraoui, A-M. Kermarrec, A. Maurer, M. Pavlovic, W. Zwaenepoel. [pdf-arXiv]
[USENIX ATC '17] TRIAD: Creating Synergies Between Memory, Disk and Log in LSM Key-Value Stores. USENIX Annual Technical Conference 2017 (21% acceptance ratio). O. Balmau, D. Didona, R. Guerraoui, W. Zwaenepoel, H. Yuan, A. Arora, K. Gupta, P. Konka. [pdf] [code][slides]
[EuroSys '17] FloDB: Unlocking Memory in Persistent Key-Value Stores. The European Conference on Computer Systems 2017 (20% acceptance ratio). O. Balmau, R. Guerraoui, V. Trigonakis, I. Zablotchi. [pdf] [code][slides]
[SPAA '16] Fast and robust memory reclamation for concurrent data structures. ACM Symposium on Parallelism in Algorithms and Architectures (24% acceptance ratio). O. Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi. [pdf][code]
[SmartGridComm '14] Evaluation of RPL for medium voltage power line communication. IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids. O. Balmau, D. Dzung, A. Karaağaç, V. Nesovic, A. Paunovic, Y-A. Pignolet, N. Tehrani.
[SmartGridComm '14] Recipes for faster failure recovery in Smart Grid communication networks. IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids. O. Balmau, D. Dzung, Y-A. Pignolet.
Awards and Honors
SEC 2024 Best Paper Award.
MLCommons Hero Award 2023. MLCommons community award for "Superb leadership, dedication, and enthusiasm" as an MLPerf Storage working group chair.
ACM SIGOPS Dennis M. Ritchie Doctoral Dissertation Award 2021 – Honorable mention. International award created by ACM SIGOPS to recognize research and creativity in computer systems.
CORE John Makepeace Bennett Award 2021 for the best Computer Science doctoral dissertation in Australia and New Zealand.
USENIX ATC 2019 Best Paper Award.
University of Sydney Faculty of Engineering and IT Dean’s Postgraduate Research Scholarship.
EPFL Fellowship for Doctoral Studies.
EPFL Teaching Assistant Award for Teaching Excellence.
Brown University Presidential Fellowship for Incoming Graduate Students.
EPFL Excellence Fellowship for the Master Studies.
Teaching
Fall 23 COMP-513 – Advanced Computer Systems. Graduate and undergraduate-level course on advanced system design and implementation.
Winter 23, Fall 23 COMP-604 – Graduate School Fundamentals. Graduate-level course on non-technical topics to support CS PhD students throughout their degree.
Winter 23, 22, 21 COMP-310/ECSE-427 – Operating Systems. Undergraduate-level course on the fundamentals of OS design and implementation.
Fall 21 COMP-596 – Principles of Computer Systems. A new course on the principles of computer systems design. The class is open to graduate and undergraduate students.
Academic Service
Program committees:
SOSP '25, '21.
SIGMOD '24 '22 '21 research track.
FAST '24.
MLSys '23.
ASPLOS '23 External Review Committee.
EuroSys '23, '21. EuroSys '21 Doctoral Workshop.
OSDI '22 PCs. OSDI '21 External Review Committee.
Reviewer for VLDB '24, Transactions in Storage (TOS) '22, '21.
Organization:
Co-creator and co-organizer of the Dagstuhl Seminar '24 on Resource-efficient Machine Learning, with Matthias Böhm, Ana Klimovic, Peter R. Pietzuch, and Pinar Tözün.
Co-creator and co-organizer of the Data Systems meet Data Science (DSDS '22, '23) workshop. Co-chairs: Bettina Kemme, Essam Mansour.
Co-creator and co-organizer of the DND IDEaS '23 workshop on efficient, reliable, and secure edge computing. Co-chairs: Eyal de Lara, David Lie, Julien Gascon-Samson, and Aastha Mehta.
EuroSys '22 Doctoral Workshop co-chair with Valerio Schiavoni and Pierre Sutra.
About Me
Outside of research, I enjoy:
Yoga. I am a certified Yoga teacher for Hatha, Yin, prenatal and postpartum yoga (>500h).
Dancing. I practice a variety of styles including salsa, modern, and bellydancing.