PACMI'25 - Schedule

Workshop Schedule

9:00 am - 9:10 am Opening Remarks

Yawen Wang, SystemsResearch@Google
Francis Y. Yan, University of Illinois Urbana-Champaign

9:10 am - 10:10 am Keynote

Democratizing Deep Learning: Making LLM Training and Inference Accessible with Consumer-grade GPUs

Dongsu Han, KAIST

Abstract: The prohibitive cost of training and deploying large language models (LLMs) on expensive datacenter-grade GPUs creates a significant barrier to AI innovation and research. More critically, this cost barrier represents one of the most pressing practical challenges in AI for systems: without affordable AI infrastructure, the promise of AI-enhanced systems remains inaccessible to most researchers and organizations. This talk will present how AI-enhanced system designs can fundamentally transform this landscape by enabling cost-effective LLM training and inference on commodity consumer-grade GPUs. I will demonstrate how intelligent system optimizations—leveraging AI for memory management, communication scheduling, and speculative execution—can overcome the fundamental limitations of consumer hardware: restricted GPU memory and constrained network bandwidth.

I will present three AI-driven systems that exemplify this approach: ES-MoE (ICML 2024), which uses adaptive memory management to train large models on limited GPU memory; StellaTrain (SIGCOMM 2024), which employs intelligent scheduling to enable effective distributed training across bandwidth-constrained networks; and SpecEdge (NeurIPS 2025), our latest work that leverages speculative execution and intelligent batching to reduce inference costs by 50% while improving Time Per Output Token (TPOT) by 10%. These systems demonstrate that AI-enhanced system design not only democratizes access to large-scale AI but can actually improve performance compared to traditional approaches. This work opens new research directions for AI-driven systems: exploring how intelligent system optimizations could enable practical AI deployment in resource-constrained environments and potentially reshape the landscape of AI for systems beyond traditional datacenter boundaries.

Speaker Bio: Dongsu Han is a Professor at the School of Electrical Engineering and Graduate School of AI at KAIST. He received his Ph.D. in Computer Science from Carnegie Mellon University in 2012. His research focuses on democratizing AI systems and addressing challenges in modern Internet applications at scale. His recent work on making AI accessible through commodity hardware has been published at premier venues including ICML 2024 (ES-MoE), SIGCOMM 2024 (StellaTrain), and NeurIPS 2025 (SpecEdge). Throughout his career, he has published extensively in ACM SIGCOMM, USENIX OSDI, USENIX NSDI, ACM CCS, and other top-tier venues. His contributions have been recognized with the USENIX NSDI Best Paper Award and USENIX NSDI Community Award. He serves as an Associate Editor for IEEE/ACM Transactions on Networking and served as Program co-Chair for ACM CoNEXT 2020 and General co-Chair for IEEE ICNP 2025.

10:10 am - 10:45 am Session 1: New Abstraction for System Design

Intent-Based System Design and Operation

Vaastav Anand, Max Planck Institute for Software Systems; Yichen Li, The Chinese University of Hong Kong; Alok Gautam Kumbhare, Celine Irvene, Chetan Bansal, Gagan Somashekar, Jonathan Mace, Pedro Las-Casas, Ricardo Bianchini, and Rodrigo Fonseca, Microsoft

OQueue: Observable Communication in Learning Directed Operating Systems

Aditya Tewari, University of Texas at Austin; Sujay Yadalam, University of Wisconsin-Madison; Arthur Peters, Saurabh Agarwal, and Aditya Akella, UT Austin; Michael M. Swift, University of Wisconsin-Madison; Christopher J. Rossbach, UT Austin and Microsoft

10:45 am - 11:15 am Coffee Break

11:15 am - 1:00 pm Session 2: ML for System Performance and Resource Management

Toward Interference-Aware Scheduling for Serverless Functions via eBPF and Meta-Learning

Yifan Zhang, Jianchang Su, and Zixu Shen, University of Connecticut; Yang Zhou, UC Davis; Wei Zhang, University of Connecticut

Set It and Forget It: Zero-Mod ML Magic for Linux Tuning

Georgios Liargkovas, Prabhpreet Singh Sodhi, and Kostis Kaffes, Columbia University

Challenges in Designing Robust RL-Based Autoscalers

Navidreza Asadi, Dalal Ali, Răzvan-Mihai Ursu, and Wolfgang Kellerer, Technical University of Munich

Merlin: Improving Page Prefetching via Online Reinforcement Learning

Yingying Liu and Junzhe Li, The University of Hong Kong; Junzhou Fang, Zhejiang University; Chenxiong Qian, The University of Hong Kong

Evolving Beyond Pressure: RL-enhanced Camera Launch for Resource-Critical Scenarios

Zicheng Wang, Honor Device Co.,Ltd; Zesen Liu, Nanjing University; Lizhi Sun, Yinggang Guo, Ligeng Chen, Yixin Guo, Claire Gu, Jun Xiao, Tao Wang, and Lu Liu, Honor Device Co.,Ltd; Yanyan Jiang, Nanjing University

Data Knows What the App Needs: An Intelligent Resource Watermark for Mobile Systems

Zesen Liu, Nanjing University; Zicheng Wang, Yinggang Guo, Lizhi Sun, Ligeng Chen, Yixin Guo, Claire Gu, Jun Xiao, Tao Wang, and Lu Liu, Honor Device Co.,Ltd; Yanyan Jiang, Nanjing University

1:00 pm - 2:00 pm Lunch

2:00 pm - 3:45 pm Session 3: Deploying and Scaling ML-Based Solutions

Into the Wild: Real-World Testing for ML-Based ABR

Benjamin Hoffman, Alexander Dietmüller, Ayush Mishra, and Laurent Vanbever, ETH Zurich

Bridging Natural Resilience and Cost-Effectiveness in SSDs for Containerized ML Applications

Seungkwan Kang, KAIST; Miryeong Kwon, Panmnesia; Seungjun Lee, Huiwon Choi, and Myoungsoo Jung, KAIST

Easing the path to deployment in ML4Sys through FPGAs

Maximilian Jakob Heer, Benjamin Ramhorst, and Gustavo Alonso, ETH Zurich

Modeling Economic Viability for Scalable AI Deployment in Emerging Regions

Rohail Asim and Ankit Bhardwaj, New York University; Arjuna Sathiaseelan, Flipped.ai; Yasir Zaki, New York University Abu Dhabi; Lakshmi Subramanian, New York University

FLOSS: Federated Learning with Opt-Out and Straggler Support

David J. Goetze, Dahlia J. Felten, Jeannie R. Albrecht, and Rohit Bhattacharya, Williams College

Piper: Towards Flexible Pipeline Parallelism for PyTorch

Megan Frisella and Arvin Oentoro, Xiangyu Gao, Gilbert Bernstein, and Stephanie Wang, University of Washington

3:45 pm - 4:15 pm Coffee Break

4:15 pm - 5:45 pm Session 4: Assuring and Optimizing Next-Generation LLM Systems

AgentSight: System-Level Observability for AI Agents Using eBPF

Yusheng Zheng, UC Santa Cruz; Yanpeng Hu, ShanghaiTech University; Tong Yu, eunomia-bpf Community; Andi Quinn, UC Santa Cruz

Frontier: Simulating the Next Generation of LLM Inference Systems

Yicheng Feng, Xin Tan, and Kin Hang Sew, The Chinese University of Hong Kong; Yimin Jiang and Yibo Zhu, StepFun; Hong Xu, The Chinese University of Hong Kong

Guarding LLM-aided Software Transformation Tasks via Component Exoskeletons

Evangelos Lamprou, Brown University; Christian Gram Kalhauge, DTU; Martin Rinard, MIT; Nikos Vasilakis, Brown University

Securing MCP-based Agent Workflows

Grigoris Ntousakis, Brown University; Julian James Stephen, Michael Le, Sai Sree Chukkapalli, Teryl Taylor, IBM Research; Ian M Molloy, IBM; Frederico Araujo, IBM Research

Towards Safe Agentic AI Performance Engineering

Dan Williams and Milo Craun, Virginia Tech; Michael V. Le and Julian James Stephen, IBM; Salman Ahmed, IBM Research, Yorktown Heights and Hani Jamjoom, IBM

Page updated

Google Sites

Report abuse