Theory of Large ML Models

Apr 1st 2024: Alex Dimakis (UT Austin)

Title: Deep Generative models and Inverse problems for Signal Reconstruction.

Abstract: Sparsity-based methods like compressed sensing have led to significant technological breakthroughs in signal processing, compression and medical imaging. Deep generative models like GANs, VAEs and Diffusions are data-driven signal models that are showing impressive performance. We will survey our framework of how pre-trained generative models can be used as priors to solve inverse problems like denoising, filling missing data, and recovery from linear projections in an unsupervised way. We generalize compressed sensing theory beyond sparsity, extending Restricted Isometries to sets created by deep generative models. We will also discuss applications to accelerating MRI, fairness in imaging and numerous open problems.

Feb 9th 2024: Kangwook Lee (UW Madison)

Title: Theoretical Exploration of Foundation Model Adaptation Methods

Abstract: Due to the enormous size of foundation models, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm, developing theoretical foundations for the new paradigm is of utmost importance.

In this talk, I will introduce two preliminary results toward this goal. In the first part, I will present a theoretical analysis of Low-Rank Adaptation (also known as LoRA), one of the most popular PEFT methods today. Our analysis of the expressive power of LoRA not only helps us better understand the high adaptivity of LoRA observed in practice but also provides insights to practitioners. In the second part, I will introduce our probabilistic framework for a better understanding of ICL. With our framework, one can analyze the transition between two distinct modes of ICL: task retrieval and learning. We also discuss how our framework can help explain and predict various phenomena, which can be observed with large language models in practice yet not fully explained.

Feb 05th 2024: Sasha Rush (Cornell)

Title: Scaling Data-Constrained Language Model

Abstract: Extrapolating scaling trends suggest that training dataset size for LLMs may soon be limited by the amount of text data available on the internet. In this talk we investigate scaling language models in data-constrained regimes. Specifically, we run a set of empirical experiments varying the extent of data repetition and compute budget. From these experiments we propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we discuss and experiment with approaches for mitigating data scarcity.

Oct 30th 2023: Boaz Barak (Harvard)

Title: The Uneasy Relation Between Deep Learning and Statistics

Abstract: Deep learning uses the language and tools of statistics and classical machine learning, including empirical and population losses and optimizing a hypothesis on a training set. But it uses these tools in regimes where they should not be applicable: the optimization task is non-convex, models are often large enough to overfit, and the training and deployment tasks can radically differ.

In this talk I will survey the relation between deep learning and statistics. In particular we will discuss recent works supporting the emerging intuition that deep learning is closer in some aspects to human learning than to classical statistics. Rather than estimating quantities from samples, deep neural nets develop broadly applicable representations and skills through their training.

The talk will not assume background knowledge in artificial intelligence or deep learning.

Sept 5th 2023: NSF TRIPODS Panel discussion on Reliability of LLMs

The panelists are:

Marzyeh Ghassemi (MIT),

James Zou (Stanford),

Ernest Davis (NYU), and

Nisheeth Vishnoi (Yale)