I am an Assistant Professor in the School of Computer Science at McGill University, where I lead the Data-Intensive Storage and Computer Systems Laboratory (DISCS Lab). My current focus is on storage and persistent memory technologies, with an emphasis on the way we manage large-scale data for emerging workloads in Data Science, Machine Learning, and Edge Computing.
I completed my PhD in Computer Science at the University of Sydney in 2020, advised by Prof. Willy Zwaenepoel. My dissertation research was on the design and implementation of efficient key-value stores for future hardware and performance requirements. My PhD was generously supported by The University of Sydney Faculty of Engineering and IT Dean's Postgraduate Research Scholarship and by the EPFL Fellowship for Doctoral Studies. I have earned my Bachelors and Masters degrees in Computer Science from EPFL.
Office: McConnell 113N
I am teaching COMP 596 - Principles of Computer Systems in Fall 22Undergraduate (COMP-310 prerequisite) and graduate students (no prerequisites) are welcome! Have a look at the tentative syllabus and sign up if you want to learn more about how software and hardware come together in large computer systems. It will be an intimate, discussion-oriented class, with spots limited to 30 students.
I have open positions for PhD students! These positions are fully funded. Interested in groundbreaking computer systems research? Have a look at the DISCS Lab research focus areas and get in touch if you are motivated, have a strong academic record, and you'd like to join the team! Please read this before sending me email.
[May. 2022] I am giving a talk in the Boston University MIDAS seminar. Link here!
[Apr. 2022] I gave a talk in the MLCommons Community Meeting to present the MLPerf Storage benchmark working group I am co-chairing. Link here!
[Apr. 2022] Our paper on stream processing on the edge was accepted into SEC'22! Congratulations to the team!
[Mar. 2022] Our paper on cutting latency in learned indexes was accepted into CHEOPS'22! Congratulations to the team, in particular to the undergraduate student authors, Yong Zhang and Xinran Xiong!! [pdf] [code] [talk]
[Mar. 2022] I will serve on the EuroSys '23 PC and the ASPLOS '23 ERC.
[Feb. 2022] I will give a talk in the University of Columbia Systems Seminar.
[Jan. 2022] I will give a talk on managing mixed transactional and analytics workloads at the University of Toronto.
[Dec. 2021] I will be co-chairing the EuroSys '22 Doctoral Workshop, together with Profs. Valerio Schiavoni and Pierre Sutra.
[Nov. 2021] On Nov 19, I will give an in-person talk in the University of Rochester CS Seminar Series on key-value stores for mixed transactional and analytics workloads!
[Oct. 2021] My PhD thesis was awarded an Honorable Mention for the Dennis M. Ritchie Doctoral Dissertation Award! Many thanks to the ACM SIGOPS awards committee, and to my advisor Prof. Willy Zwaenepoel.
[Oct. 2021] I will serve in the OSDI '22, and SIGMOD '22 PCs.
[Aug. 2021] DISCS has a new website!
[Jun. 2021] I was awarded a John R. Evans Leaders Fund (JELF) Grant by the Canada Foundation for Innovation. Thank you, CFI!
[Jun. 2021] I was awarded a Discovery Grant by the Natural Sciences and Engineering Research Council of Canada. Thank you, NSERC!
Current Research Focus
Data powers everything we do and we are collecting it at unprecedented rates. The driver for my current research is to create a storage infrastructure that enables us to gain insights from this data in a fast and energy-conscious manner. See our latest work and research directions on the DISCS lab website.
Key-value stores (KVs) are a crucial component in cloud computing because they can efficiently handle large-scale, diverse data (e.g., deployed in the infrastructure of Google, Apple, Facebook, and Amazon). In my dissertation, Redesigning Persistent Key-Value Stores for Future Workloads, Hardware, and Performance Requirements, I proposed new techniques to improve persistent KVs. I designed and built four novel open-source systems: TRIAD, FloDB, SILK, and KVell. For an overview, have a look at my job talk.
KVell+ [OSDI '20] addresses space amplification for queries executed under Snapshot Isolation in KVs. Frequent updates during long-running analytics queries create significant space amplification, and resulting garbage collection gives rise to latency spikes for shorter transactions. We introduce a new model for processing analytics queries based on the observation that such queries consist in large part of commutative processing of data items resulting from range-scans, in which each item in the range is read exactly once. OLCP incurs little or no space amplification or garbage collection overhead. [pdf] [code][slides][talk-OSDI by Baptiste Lepers]
KVell [SOSP '19] provides surprising insights into new storage technologies and their impact on current persistent KV designs. The emergence of fast drives shifts the bottleneck from I/O bandwidth to the CPU, making it necessary to revisit previous fundamental design assumptions, such as maintaining the sorted order of data and making use of complex synchronization primitives. [pdf] [code][slides][talk-SOSP by Baptiste Lepers]
SILK+ [TOCS '20] builds upon the SILK I/O scheduler, adding support for workloads with heterogeneous item sizes and analytics queries (i.e., range scans). SILK+ is an important addition for production workloads such as the ones at Nutanix, Pinterest, and Wikipedia, where the item sizes can differ by up to three orders of magnitude. Academic Impact: This work was an invited paper for the Special Issue of TOCS '20. [pdf]
SILK [ATC '19] addresses the issue of tail latency in log-structured merge KVs, stemming from significant interference between client work and KV maintenance operations. The interference creates a bottleneck at the I/O bandwidth level. SILK prevents tail latency spikes through a novel opportunistic I/O bandwidth scheduling mechanism. Academic Impact: This work received one of three Best Paper Awards in USENIX ATC '19 (top 3 out of 356 submissions). [pdf] [code][slides][talk-MSR]
TRIAD [ATC '17] focuses on the disk utilization of KVs. Through its three complementary techniques acting at the memory, disk and commit log levels, TRIAD drastically reduces write amplification in persistent storage and the effect of KV maintenance operations. The reduced write amplification leads to a commensurate throughput improvement for the client-facing workload. Industry Impact: This work is currently used in production at Nutanix and was featured on Mark Callaghan's Small Datum blog. [pdf] [code][slides]
FloDB [ATC '17] addresses the issue of scalability with the memory size and with the number of threads in persistent KVs, again resulting in important gains in throughput for client workloads. The main contribution is a new two-layer data-structure design which is highly concurrent and improves the data flow from clients, to memory, to disk. [pdf] [code][slides]
Winter 22 COMP-310/ECSE-427 – Operating Systems. Undergraduate-level course on the fundamentals of OS design and implementation.
Fall 21 COMP-596 – Principles of Computer Systems. A new course on the principles of computer systems design. The class is open to graduate and undergraduate students.
[OSDI '20] Snapshot Isolation Without Snapshots. USENIX Symposium on Operating Systems Design and Implementation 2020 (17% acceptance ratio). B. Lepers, O. Balmau, K. Gupta , W. Zwaenepoel. [pdf] [code] [slides]
[THESIS] Redesigning Persistent Key-Value Stores for Future Workloads, Hardware, and Performance Requirements. Oana Balmau. Doctoral Dissertation, The University of Sydney, 2020. Advised by Prof. Willy Zwaenepoel. PhD Committee: Dr. Ricardo Bianchini, Prof. Vijay Chidambaram, Prof. Frans Kaashoek. Winner of CORE John Makepeace Bennett Award 2021 for the best Computer Science doctoral dissertation in Australia and New Zealand [pdf].
[TOCS '20] SILK+: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads. ACM Transactions on Computer Systems Special Issue. O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta , R. Chandhiramoorthi, D. Didona. Invited paper. [pdf]
[SOSP '19] KVell: the Design and Implementation of a Fast Persistent Key-Value Store. Symposium on Operating Systems Principles 2019 (14% acceptance ratio). B. Lepers, O. Balmau, K. Gupta , W. Zwaenepoel. [pdf] [code][slides][talk]
[USENIX ATC '19] SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. USENIX Annual Technical Conference 2019 (19% acceptance ratio). O. Balmau, F. Dinu, W. Zwaenepoel, K. Gupta , R. Chandhiramoorthi, D. Didona. Best Paper Award! Invited to publish in ACM Transactions on Computer Systems (TOCS) special issue. [pdf] [code][slides][talk-MSR]
[NETYS '19] The Fake News Vaccine. The International Conference on Networked Systems 2019. O. Balmau, R. Guerraoui, A-M. Kermarrec, A. Maurer, M. Pavlovic, W. Zwaenepoel. [pdf-arXiv]
[USENIX ATC '17] TRIAD: Creating Synergies Between Memory, Disk and Log in LSM Key-Value Stores. USENIX Annual Technical Conference 2017 (21% acceptance ratio). O. Balmau, D. Didona, R. Guerraoui, W. Zwaenepoel, H. Yuan, A. Arora, K. Gupta, P. Konka. [pdf] [code][slides]
[EuroSys '17] FloDB: Unlocking Memory in Persistent Key-Value Stores. The European Conference on Computer Systems 2017 (20% acceptance ratio). O. Balmau, R. Guerraoui, V. Trigonakis, I. Zablotchi. [pdf] [code][slides]
[SPAA '16] Fast and robust memory reclamation for concurrent data structures. ACM Symposium on Parallelism in Algorithms and Architectures (24% acceptance ratio). O. Balmau, R. Guerraoui, M. Herlihy, I. Zablotchi. [pdf][code]
[SmartGridComm '14] Evaluation of RPL for medium voltage power line communication. IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids. O. Balmau, D. Dzung, A. Karaağaç, V. Nesovic, A. Paunovic, Y-A. Pignolet, N. Tehrani.
[SmartGridComm '14] Recipes for faster failure recovery in Smart Grid communication networks. IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids. O. Balmau, D. Dzung, Y-A. Pignolet.
Awards and Honors
CORE John Makepeace Bennett Award 2021 for the best Computer Science doctoral dissertation in Australia and New Zealand.
USENIX ATC 2019 Best Paper Award.
University of Sydney Faculty of Engineering and IT Dean’s Postgraduate Research Scholarship.
EPFL Fellowship for Doctoral Studies.
EPFL Teaching Assistant Award for Teaching Excellence.
Brown University Presidential Fellowship for Incoming Graduate Students.
EPFL Excellence Fellowship for the Master Studies.
ASPLOS '23 External Review Committee.
EuroSys '23, '21 Program Committee, EuroSys '21 Doctoral Workshop Program Committee.
EuroSys '22 Doctoral Workshop co-chair.
SIGMOD '22 '21 research track Program Committee.
OSDI '22 Program Committee, OSDI '21 External Review Committee.
Reviewer for Transactions in Storage (TOS) '22, '21.
SOSP '21 Program Committee.
Outside of research, I enjoy:
Yoga. I am a certified Hatha and Yin Yoga teacher (400h), maintaining a daily asana, pranayama and meditation practice.
Dancing. I practice a variety of styles including salsa, modern, and bellydancing.
Scuba Diving and Hiking. I love exploring the underwater world and the mountains during my travels.