Yawen Wang, SystemsResearch@Google
Francis Y. Yan, University of Illinois Urbana-Champaign
Democratizing Deep Learning: Making LLM Training and Inference Accessible with Consumer-grade GPUs
Dongsu Han, KAIST
Abstract: The prohibitive cost of training and deploying large language models (LLMs) on expensive datacenter-grade GPUs creates a significant barrier to AI innovation and research. More critically, this cost barrier represents one of the most pressing practical challenges in AI for systems: without affordable AI infrastructure, the promise of AI-enhanced systems remains inaccessible to most researchers and organizations. This talk will present how AI-enhanced system designs can fundamentally transform this landscape by enabling cost-effective LLM training and inference on commodity consumer-grade GPUs. I will demonstrate how intelligent system optimizations—leveraging AI for memory management, communication scheduling, and speculative execution—can overcome the fundamental limitations of consumer hardware: restricted GPU memory and constrained network bandwidth.
I will present three AI-driven systems that exemplify this approach: ES-MoE (ICML 2024), which uses adaptive memory management to train large models on limited GPU memory; StellaTrain (SIGCOMM 2024), which employs intelligent scheduling to enable effective distributed training across bandwidth-constrained networks; and SpecEdge (NeurIPS 2025), our latest work that leverages speculative execution and intelligent batching to reduce inference costs by 50% while improving Time Per Output Token (TPOT) by 10%. These systems demonstrate that AI-enhanced system design not only democratizes access to large-scale AI but can actually improve performance compared to traditional approaches. This work opens new research directions for AI-driven systems: exploring how intelligent system optimizations could enable practical AI deployment in resource-constrained environments and potentially reshape the landscape of AI for systems beyond traditional datacenter boundaries.
Speaker Bio: Dongsu Han is a Professor at the School of Electrical Engineering and Graduate School of AI at KAIST. He received his Ph.D. in Computer Science from Carnegie Mellon University in 2012. His research focuses on democratizing AI systems and addressing challenges in modern Internet applications at scale. His recent work on making AI accessible through commodity hardware has been published at premier venues including ICML 2024 (ES-MoE), SIGCOMM 2024 (StellaTrain), and NeurIPS 2025 (SpecEdge). Throughout his career, he has published extensively in ACM SIGCOMM, USENIX OSDI, USENIX NSDI, ACM CCS, and other top-tier venues. His contributions have been recognized with the USENIX NSDI Best Paper Award and USENIX NSDI Community Award. He serves as an Associate Editor for IEEE/ACM Transactions on Networking and served as Program co-Chair for ACM CoNEXT 2020 and General co-Chair for IEEE ICNP 2025.
Intent-Based System Design and Operation
Vaastav Anand, Max Planck Institute for Software Systems; Yichen Li, The Chinese University of Hong Kong; Alok Gautam Kumbhare, Celine Irvene, Chetan Bansal, Gagan Somashekar, Jonathan Mace, Pedro Las-Casas, Ricardo Bianchini, and Rodrigo Fonseca, Microsoft
OQueue: Observable Communication in Learning Directed Operating Systems
Aditya Tewari, University of Texas at Austin; Sujay Yadalam, University of Wisconsin-Madison; Arthur Peters, Saurabh Agarwal, and Aditya Akella, UT Austin; Michael M. Swift, University of Wisconsin-Madison; Christopher J. Rossbach, UT Austin and Microsoft
Toward Interference-Aware Scheduling for Serverless Functions via eBPF and Meta-Learning
Yifan Zhang, Jianchang Su, and Zixu Shen, University of Connecticut; Yang Zhou, UC Davis; Wei Zhang, University of Connecticut
Set It and Forget It: Zero-Mod ML Magic for Linux Tuning
Georgios Liargkovas, Prabhpreet Singh Sodhi, and Kostis Kaffes, Columbia University
Challenges in Designing Robust RL-Based Autoscalers
Navidreza Asadi, Dalal Ali, Răzvan-Mihai Ursu, and Wolfgang Kellerer, Technical University of Munich
Merlin: Improving Page Prefetching via Online Reinforcement Learning
Yingying Liu and Junzhe Li, The University of Hong Kong; Junzhou Fang, Zhejiang University; Chenxiong Qian, The University of Hong Kong
Evolving Beyond Pressure: RL-enhanced Camera Launch for Resource-Critical Scenarios
Zicheng Wang, Honor Device Co.,Ltd; Zesen Liu, Nanjing University; Lizhi Sun, Yinggang Guo, Ligeng Chen, Yixin Guo, Claire Gu, Jun Xiao, Tao Wang, and Lu Liu, Honor Device Co.,Ltd; Yanyan Jiang, Nanjing University
Data Knows What the App Needs: An Intelligent Resource Watermark for Mobile Systems
Zesen Liu, Nanjing University; Zicheng Wang, Yinggang Guo, Lizhi Sun, Ligeng Chen, Yixin Guo, Claire Gu, Jun Xiao, Tao Wang, and Lu Liu, Honor Device Co.,Ltd; Yanyan Jiang, Nanjing University
Into the Wild: Real-World Testing for ML-Based ABR
Benjamin Hoffman, Alexander Dietmüller, Ayush Mishra, and Laurent Vanbever, ETH Zurich
Bridging Natural Resilience and Cost-Effectiveness in SSDs for Containerized ML Applications
Seungkwan Kang, KAIST; Miryeong Kwon, Panmnesia; Seungjun Lee, Huiwon Choi, and Myoungsoo Jung, KAIST
Easing the path to deployment in ML4Sys through FPGAs
Maximilian Jakob Heer, Benjamin Ramhorst, and Gustavo Alonso, ETH Zurich
Modeling Economic Viability for Scalable AI Deployment in Emerging Regions
Rohail Asim and Ankit Bhardwaj, New York University; Arjuna Sathiaseelan, Flipped.ai; Yasir Zaki, New York University Abu Dhabi; Lakshmi Subramanian, New York University
FLOSS: Federated Learning with Opt-Out and Straggler Support
David J. Goetze, Dahlia J. Felten, Jeannie R. Albrecht, and Rohit Bhattacharya, Williams College
Piper: Towards Flexible Pipeline Parallelism for PyTorch
Megan Frisella and Arvin Oentoro, Xiangyu Gao, Gilbert Bernstein, and Stephanie Wang, University of Washington
AgentSight: System-Level Observability for AI Agents Using eBPF
Yusheng Zheng, UC Santa Cruz; Yanpeng Hu, ShanghaiTech University; Tong Yu, eunomia-bpf Community; Andi Quinn, UC Santa Cruz
Frontier: Simulating the Next Generation of LLM Inference Systems
Yicheng Feng, Xin Tan, and Kin Hang Sew, The Chinese University of Hong Kong; Yimin Jiang and Yibo Zhu, StepFun; Hong Xu, The Chinese University of Hong Kong
Guarding LLM-aided Software Transformation Tasks via Component Exoskeletons
Evangelos Lamprou, Brown University; Christian Gram Kalhauge, DTU; Martin Rinard, MIT; Nikos Vasilakis, Brown University
Securing MCP-based Agent Workflows
Grigoris Ntousakis, Brown University; Julian James Stephen, Michael Le, Sai Sree Chukkapalli, Teryl Taylor, IBM Research; Ian M Molloy, IBM; Frederico Araujo, IBM Research
Towards Safe Agentic AI Performance Engineering
Dan Williams and Milo Craun, Virginia Tech; Michael V. Le and Julian James Stephen, IBM; Salman Ahmed, IBM Research, Yorktown Heights and Hani Jamjoom, IBM