I explore the path towards an ML-specialized OS, ROSA-OS. ROSA-OS rethinks the OS architecture to make it specifically tailored to ML workloads, especially in virtualized clouds, which are now widely used to run ML applications. ROSA-OS’s envisioned architecture includes (1) a microkernel, Micro-LAKE, which allows kernel space applications to use the GPU, and (2) an MLaaS (ML as a Service) subsystem that gathers ML models to help Micro-LAKE with memory management and CPU scheduling.
MinatoLoader is a general-purpose data loader for PyTorch that accelerates training and improves GPU utilization under single-server, multi-GPU settings. It continuously prepares data in the background and constructs batches by prioritizing fast-to-process samples, while slower samples are processed in parallel.
Data movement is the leading cause of performance degradation and energy consumption in modern data centers. Processing in-memory (PIM) is an architecture that addresses data movement by bringing computation inside the memory chips. This paper is the first to study the virtualization of PIM devices by designing and implementing vPIM, an open-source UPMEM-based virtualization system for the cloud. Our vPIM design considers four requirements: Compatibility such that no hardware and no hypervisor changes are needed; Multiplexing and isolation for a higher utilization ratio; Utilizability and transparency such that applications written for PIM can be efficiently run out-of-the-box, leading to rapid adoption; Minimalization of virtualization performance overhead.
For virtualized cloud applications, GuaNary is a novel defense against overflows, allowing synchronous detection at a low memory footprint cost. To this end, GuaNary leverages Intel Sub-Page write Permission (SPP), a recent hardware virtualization feature that allows to write-protect guest memory at the granularity of 128B (namely, sub-page) instead of 4KB. We implement a software stack, LeanGuard, which promotes the utilization of SPP from inside virtual machines by new secure allocators that use GuaNary. Our evaluation shows that for the same number of protected buffers, LeanGuard consumes 8.3× less memory compared to SlimGuard, a state-of-the-art secure allocator. Further, for a given amount of memory, LeanGuard allows protecting 25× more buffers than SlimGuard. The following figure presents the design of LeanGuard.
Out of Hypervisor (OoH) is a new virtualization research axis. Instead of emulating full virtual hardware inside a VM to support a hypervisor, the OoH principle is to individually expose current hypervisor-oriented hardware virtualization features to the guest OS. This way, guests' processes could also take advantage of those features. I illustrate OoH with Intel PML (Page Modification Logging), a feature that allows efficient dirty page tracking to improve VM live migration. Because dirty page tracking is at the heart of many essential tasks, including process checkpointing (e.g., CRIU) and concurrent garbage collection (e.g, Boehm GC), OoH exposes PML to accelerate these tasks in the guest. We present two OoH solutions, namely Shadow PML (SPML) and Extended PML (EPML) that we integrated into CRIU and Boehm GC. Evaluation results showed that EPML speeds up CRIU checkpointing by about 13x and Boehm garbage collection by up to 6× compared to SPML, /proc, and userfaultfd while reducing the impact of monitoring applications by about 16x.
© S. Bitchebe 2025 - Latest update: August 19, 2025
stella.bitchebe@mcgill.ca