Research

Efficient Memory Management for High-Performance Computing

Due to the challenge of scaling DRAM density, a new class of memory (e.g., CXL Memory Expander), has received a lot of attention to bridge the performance gap between DRAM and SSD. Although such new class memory can provide abundant capacity, the performance is not comparable to the conventional DRAM. As a result, we expect that future large memory systems will be a form of tiered memory architecture. In this study, we revisit the design and implementation of memory management in state-of-the-art Linux for achieving high-performance. [Slides]

Keywords: Memory Management, Operating Systems, Linux Kernel Programming 

[ACM APSys '21] Rethinking Remote Memory Placement on Large-Memory Systems with Path Diversity [USENIX ATC '21] Exploring the Design Space of Page Management for Multi-Tiered Memory Systems [GitHub][IEEE CAL '20] A Study of Memory Placement on Hardware-assisted Tiered Memory Systems

Systems for Artificial Intelligence (Large Language Models)

A variety of deep learning (DL) based services including image classification, natural language processing, and recommendation are widely deployed in data centers such as Facebook, Google, Microsoft, Alibaba, and Netflix. There have been significant efforts to optimize the model serving systems. In this study, we focus on the impact of scheduling queries across heterogeneous systems, which equipped with CPUs, GPUs, and customized accelerators, to maximize the latency-bounded throughput.

Also, as modern machine learning models are becoming much larger and more complex, ML training systems require a large amount of memory as well as heavy computation capabilities. However, scaling the GPU memory capacity has been limited. As a result, it is challenging to train large machine learning models in a single GPU. In this project, we study how the operating systems or machine learning frameworks (e.g., PyTorch or TensorFlow) should manage the unified memory across  GPUs for emerging applications. [Slides]

Keywords: Model Serving,  Resource Scheduling, Unified Memory, ML Frameworks

[ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access [GitHub][USENIX ATC '22] Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory [GitHub][USENIX ATC '21] Zico: Efficient GPU Memory Sharing for Concurrent DNN Training 

Datacenter & Cloud Computing

Today, virtualization is a key enabler technology for cloud computing because it allows flexible resource management by allocating virtual machines, instead of physical systems, to cloud users. In addition, by consolidating underutilized systems onto fewer servers, system virtualization can provide high resource efficiency and reduce energy consumption. In this project, we introduce hardware and software techniques for the efficient CPU and memory virtualization.

Keywords: Hypervisor, Scheduling, Address Translation, Migration

[IEEE TPDS '20] Reconciling Time Slice Conflicts of Virtual Machines with Dual Time Slice for Clouds[ACM EuroSys '19] GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks [ACM EuroSys '18] Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores [GitHub]

Sponsors