Seminars > Seminar Details
by Meng Li
Assistant Professor
Peking University
The recent progress of large language models (LLMs) has revolutionized the field of natural language processing, computer vision, etc. However, following the scaling law, LLM model size increases exponentially, which leads to significant memory capacity and bandwidth bottleneck. The auto-regressive decoding pattern and long context requirements further exacerbate these challenges. In this talk, I will introduce the recent research on speculative decoding and mixture-of-exerpt LLM inference, which co-designs the algorithm and hardware to alleviate the memory bottlenck for efficient LLM inference.
Speaker Bio:
Prof. Meng Li is currently a tenure-track assistant professor in Peking University, jointly affiliated with Institute for Artificial Intelligence and School of Integrated Circuits. Before joining Peking University, he was a staff research scientist and tech lead in Meta Reality Lab focusing on research and productization of efficient AI algorithms and hardware/systems for next generation AR/VR devices. Dr. Li received his Ph.D. degree from the University of Texas at Austin in 2018 and his bachelor degree from Peking University in 2013. Dr. Meng Li's research interests lie in the field of efficient and secure multi-modal AI acceleration algorithms and hardware. He has published more than 90 papers and received two best paper awards. He also receives 1st Place of AICAS LLM System Design Contest, CCF Integrated Circuit Early Career Award, EDAA Outstanding Dissertation Award, First Place in ACM Student Research Competition Grand Final (Graduate Category), Best Poster Awards in ASPDAC Student Research Forum, etc.