Foundations of Data Science - Virtual Talk Series

... on the "Theory of Large ML Models."

Kangwook Lee (UW Madison)

Friday Feb 9, 2024. 

1pm Pacific Time

Kangwook Lee is an Assistant Professor in the Electrical and Computer Engineering Department and the Computer Sciences Department (by courtesy) at the University of Wisconsin-Madison. Previously, he was a Research Assistant Professor at the Information and Electronics Research Institute of KAIST and was a postdoctoral scholar at the same institute. He received his PhD in 2016 from the Electrical Engineering and Computer Science department at UC Berkeley. He is the recipient of the IEEE Joint Communications Society/Information Theory Society Paper Award (2020) and the KSEA Young Investigator Grant Award (2022).

Title: Theoretical Exploration of Foundation Model Adaptation Methods

Register here for zoom link

Abstract: Due to the enormous size of foundation models, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm, developing theoretical foundations for the new paradigm is of utmost importance.

In this talk, I will introduce two preliminary results toward this goal. In the first part, I will present a theoretical analysis of Low-Rank Adaptation (also known as LoRA), one of the most popular PEFT methods today. Our analysis of the expressive power of LoRA not only helps us better understand the high adaptivity of LoRA observed in practice but also provides insights to practitioners. In the second part, I will introduce our probabilistic framework for a better understanding of ICL. With our framework, one can analyze the transition between two distinct modes of ICL: task retrieval and learning. We also discuss how our framework can help explain and predict various phenomena, which can be observed with large language models in practice yet not fully explained.