Efficient Key-Value store for future hardware
The objective of the project is to design an efficient key-value store for future hardware, using fast NVMe SSDs and persistent byte-addressable memory.
In our modern digital life, activities from buying clothes to accessing government services depend on working with computer applications which store information in a long-lasting form, that is, they need a data store, where information (called ”values”) is found using a label or ”key”, So the key-value store is vital for the functionality and performance of all applications. Even relational databases typically contain a key-value store as a storage engine, underneath layers that support a richer query model.
The current key-value stores are designed for today’s typical hardware environment with a memory hierarchy of slow but capacious hard disk, flash-based SSD, and fast but limited and volatile RAM. Currently, no key-value store can leverage the speed of fast NVMe SSDs and byte-addressable persistent memory -- existing key-value stores become CPU bound before reaching the limit of these devices.
The goal of this project is to rethink the way data is stored in memory and on disk to design a fast key-value store for modern drives and persistent memory. The project will start with a study of the characteristics of persistent memory. Then existing data-structures designed for volatile DRAM will be ported and optimized for persistent memory. Finally, new designs will be proposed.
Requirements: fluent C, C++, or Rust programming, notions of good practices in optimizations and efficient programming.
Detecting data races in the Linux kernel
The objective of the project is to design a new mechanism to specify and detect data races bugs in the Linux kernel.
The Linux kernel is a highly optimized multi-core system. As a consequence, kernel developers frequently use relaxed synchronization models: instead of protecting shared variables with locks, the code is carefully engineered so that variables can be read outside critical sections while still providing a consistent view of the data. For instance, the kernel often allocates an object, initializes it, and then inserts it in a list. The inserts to the list are protected by a lock, but the list can be read without taking any lock. The code is thus only correct if the inserted object has been fully and correctly initialized before being inserted in the list. The kernel relies on memory barriers to order the initialization and the insert.
However, memory barriers are notoriously tricky to use correctly. A quick search in kernel commits found hundreds of bugs related to missing barriers. These bugs are tricky to understand and to fix.
In this project, we want to create a tool that would help developers get formal guarantees on code that relies on ordering and barriers for correctness. The key idea of the project is to use existing comments in the kernel code to figure out ordering constraints between variables or functions. Kernel developers often comment on the intent of the code, and informally document the ordering constraints of their code. We want to use these comments to figure out if the commented constraints are respected throughout the entire kernel.
Requirements: notions of static analysis, ability to navigate in a large codebase (Linux kernel), fluent in Rust, good notions of C.