Workshop Schedule

9:00 AM - 9:30 AM Opening Remarks

Deniz Altınbüken, Google DeepMind

9:30 AM - 10:30 AM Invited Speaker

Perspectives on Learning-Directed Operating Systems

Christopher J. Rossbach, UT Austin

Abstract: Operating systems play a fundamental role in enabling emerging applications such as AR/VR and assistive robotics to run on modern computer systems. However, today’s OSes were developed for outdated architectures and machine organizations to support applications that no longer reflect modern workloads. Moreover, today’s OSes manage a computer’s resources (e.g. CPU, memory, I/O) using human-coded heuristic policies designed to provide reasonable worst-case performance across a wide array of environments ranging from embedded/IoT devices to server-class machines, resulting in sub-optimal resource management and under-utilization. Machine learning (ML) offers a promising path forward for OSes, by replacing fixed hand-coded heuristics with learned policies that can adapt to dynamic environments, unlocking the full potential of modern computer systems.  

This talk will describe the Learning-Directed Operating Systems (LDOS) project, which takes a clean-slate approach to OSes. LDOSes are characterized by the systematic use of machine learning throughout the OS. LDOS policies take decisions that compose to optimize the end-to-end needs of applications and system-wide goals. The talk will share our experience integrating ML in the LAKE kernel, an extended version of Linux that uses ML for resource management in a number of subsystems. The talk will focus on these efforts through the PACMI lens of identifying and elaborating on practical challenges that arise when applying ML to such systems, which in our experience, are numerous. 


Bio: Christopher J. Rossbach is an associate professor of Computer Science at UT Austin, an alumnus of VMware Research Group and of Microsoft Research's Silicon Valley Lab, and co-founder of graph computing startup Katana Graph. He leads the Systems, Concurrent, and Emerging Architectures Research Group (SCEA) at UT Austin and co-directs the Learning-Directed Operating Systems NSF Expeditions Project. His technical interests are in operating systems, distributed systems, and OS, architectural, and PL support for parallel hardware.

10:30 AM - 11:00 AM Coffee Break

11:00 AM - 12:00 PM Submission Talks


Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents

Qizheng Zhang, Stanford University; Ali Imran, Purdue University; Enkeleda Bardhi, Sapienza University; Tushar Swamy, Stanford University; Nathan Zhang, Stanford University; Muhammad Shahbaz, Purdue University & University of Michigan; Kunle Olukotun

Paper Slides

Programming Systems for Improving the Performance of Deep Learning-Intensive Applications

Yifan Zhao, Vikram Adve, Sasa Misailovic; University of Illinois Urbana-Champaign

Paper Slides


12:00 PM - 1:00 Lunch Break

1:00 PM - 2:00 PM Invited Speaker

Weaving Large Language Models into the Bug Finding Pipeline: Challenges and Opportunities

Bogdan Stoica, University of Chicago

Abstract: The rapid adoption of large language models over the past few years provides opportunities and challenges for code analysis. In this talk, we will focus on leveraging the fuzzy code comprehension capabilities of LLMs to understand, interpret, and generate code to help developers isolate difficult-to-detect bugs more efficiently.

First, we will explore how LLMs can help isolate complex implementations of specific system functionalities within large codebases, using retry logic as a case study. Such resilience functionality is diverse, does not rely solely on code structure, and can span multiple methods or even source files. Traditional program analysis often struggles to isolate such complex implementations. In contrast, LLMs are better equipped to interpret both structural (loops, state machines) and non-structural (comments, variable names) code elements holistically, providing better indicators of retry logic implementation than structural patterns alone. Yet, our findings reveal a tension: while LLMs require sufficient information to isolate retry effectively, providing too much can overwhelm them, leading to decreased accuracy and ultimately missing retry implementations.

Second, we will examine how LLMs can assist with code generation. Recent research has demonstrated their efficacy in generating tests for finding correctness bugs. Building on this, we are currently focusing on using LLMs to generate tests for isolating performance issues that appear under extreme conditions, such as workload spikes, unexpected operational slowdowns, and resource contention. Our key insight is that by starting from existing tests and iteratively modifying them to stress the system—increasing resource utilization or the number of tasks to be processed—we can uncover bugs that standard testing methods might overlook. However, this approach presents challenges, including guiding LLMs to generate correct and compilable tests, determining which system metrics to measure for evaluating test effectiveness, and using these metrics as feedback to steer LLMs toward generating more effective tests. 

Bio: Bogdan is a final-year Systems PhD candidate at the University of Chicago, where he is fortunate to be advised by Prof. Shan Lu and closely mentored by Prof. Haryadi Gunawi and Prof. Kexin Pei (UChicago), along with Dr. Suman Nath, Dr. Madan Musuvathi, and Dr. Jonathan Mace (Microsoft Research).  His research focuses on improving the correctness and performance of large-scale software systems. To this end, he develops tools centered on program analysis, fault injection, efficient code instrumentation, and large language models for fuzzy code comprehension, helping developers to better understand their code and isolate bugs more effectively. Before joining UChicago, Bogdan earned a MSc degree from EPFL and a BSc degree from the University of Bucharest. In a previous life, he had a grown-up job working as a software engineer for Microsoft and Bitdefender Labs.

2:00 PM - 3:30 PM Submission Talks


Bootstrapping Trust in ML4Nets Solutions with Hybrid Explainability

Abduarraheem Elfandi, Hannah Sagalyn, Ramakrishan Durairajan, University of Oregon; Walter Willinger, NIKSUN, Inc.

Paper Slides

PreSight: A Vision for an Instantaneous Web

Isaac Khor, Northeastern University; Suleman Ahmad, Cloudflare Inc.; Avani Wildani, Cloudflare Inc.

Paper Slides


Instance-Optimized Mapping with Portfolio Methods

Yibo Zhao, Panagiotis Manolios, and Cheng Tan; Northeastern University

Paper Slides


3:30 PM - 4:00 PM Coffee Break

4:00 PM - 5:00 PM Invited Speaker

Practical Challenges of Applying ML Models in Dynamic Data Environments: When to (Re)train and What Data to Train On?

Ana Klimovic, ETH Zurich

Abstract: The datasets fueling today’s production machine learning models (e.g., click streams, sensor data, logs) are continuously growing. To maintain high accuracy, stale models deployed in the wild need

to be retrained to incorporate new data, particularly as training data may experience distribution shifts. The cost of model retraining is proportional to how frequently the model is retrained and how much data it is trained on. However, finding the right data selection and model training triggering policies to balance accuracy and cost is non-trivial and prior work offers no system support for exploring this complex design space. In this talk, I will present Modyn, a data-centric machine learning pipeline orchestration system. Modyn’s ML pipeline abstraction enables users to declaratively describe policies for continuously training a model on a growing dataset. Modyn pipelines allow users to apply custom data selection policies (to reduce the number of data points) and triggering policies (to reduce the number of model training runs). Modyn executes and orchestrates these continuous ML training pipelines under the hood. The system is open-source and comes with an ecosystem of benchmark datasets, models, and tooling.

Bio: Ana Klimovic is an Assistant Professor in the Systems Group of the Computer Science Department at ETH Zurich. Her research interests span operating systems, computer architecture, and their intersection with machine learning. Ana's work focuses on computer system design for large-scale applications such as cloud computing services, data analytics, and machine learning. Before joining ETH in August 2020, Ana was a Research Scientist at Google Brain and completed her Ph.D. in Electrical Engineering at Stanford University.