CEDA HK - 20250606-Meng-Li

Combating Memory Bottleneck for Efficient LLM Inference

Seminars > Seminar Details

by Meng Li

Assistant Professor

Peking University

Date: June 6, 2025

Time: 9:00--10:00am

Zoom Meeting ID: 941 0590 0433

Passcode: 480626

Talk Slides:

The recent progress of large language models (LLMs) has revolutionized the field of natural language processing, computer vision, etc. However, following the scaling law, LLM model size increases exponentially, which leads to significant memory capacity and bandwidth bottleneck. The auto-regressive decoding pattern and long context requirements further exacerbate these challenges. In this talk, I will introduce the recent research on speculative decoding and mixture-of-exerpt LLM inference, which co-designs the algorithm and hardware to alleviate the memory bottlenck for efficient LLM inference.

Speaker Bio:

Prof. Meng Li is currently a tenure-track assistant professor in Peking University, jointly affiliated with Institute for Artificial Intelligence and School of Integrated Circuits. Before joining Peking University, he was a staff research scientist and tech lead in Meta Reality Lab focusing on research and productization of efficient AI algorithms and hardware/systems for next generation AR/VR devices. Dr. Li received his Ph.D. degree from the University of Texas at Austin in 2018 and his bachelor degree from Peking University in 2013. Dr. Meng Li's research interests lie in the field of efficient and secure multi-modal AI acceleration algorithms and hardware. He has published more than 90 papers and received two best paper awards. He also receives 1st Place of AICAS LLM System Design Contest, CCF Integrated Circuit Early Career Award, EDAA Outstanding Dissertation Award, First Place in ACM Student Research Competition Grand Final (Graduate Category), Best Poster Awards in ASPDAC Student Research Forum, etc.

Google Sites

Report abuse