Title: The puzzle of dimensionality and feature learning in neural networks and kernel machines
Abstract: Remarkable progress in AI has far surpassed expectations of just a few years ago. At their core, modern models, such as transformers, implement traditional statistical models -- high order Markov chains. Nevertheless, it is not generally possible to estimate Markov models of that order given any possible amount of data. Therefore these methods must implicitly exploit low-dimensional structures present in data. Furthermore, these structures must be reflected in high-dimensional internal parameter spaces of the models. Thus, to build fundamental understanding of modern AI, it is necessary to identify and analyze these latent low-dimensional structures. In this talk, I will discuss how deep neural networks of various architectures learn low-dimensional features and how the lessons of deep learning can be incorporated in non-backpropagation-based algorithms that we call Recursive Feature Machines. I will provide a number of experimental results on different types of data, as well as some connections to classical sparse learning methods, such as Iteratively Reweighted Least Squares.
Title: Deep Generative models and Inverse problems for Signal Reconstruction.
Abstract: Sparsity-based methods like compressed sensing have led to significant technological breakthroughs in signal processing, compression and medical imaging. Deep generative models like GANs, VAEs and Diffusions are data-driven signal models that are showing impressive performance. We will survey our framework of how pre-trained generative models can be used as priors to solve inverse problems like denoising, filling missing data, and recovery from linear projections in an unsupervised way. We generalize compressed sensing theory beyond sparsity, extending Restricted Isometries to sets created by deep generative models. We will also discuss applications to accelerating MRI, fairness in imaging and numerous open problems.
Title: Theoretical Exploration of Foundation Model Adaptation Methods
Abstract: Due to the enormous size of foundation models, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm, developing theoretical foundations for the new paradigm is of utmost importance.
In this talk, I will introduce two preliminary results toward this goal. In the first part, I will present a theoretical analysis of Low-Rank Adaptation (also known as LoRA), one of the most popular PEFT methods today. Our analysis of the expressive power of LoRA not only helps us better understand the high adaptivity of LoRA observed in practice but also provides insights to practitioners. In the second part, I will introduce our probabilistic framework for a better understanding of ICL. With our framework, one can analyze the transition between two distinct modes of ICL: task retrieval and learning. We also discuss how our framework can help explain and predict various phenomena, which can be observed with large language models in practice yet not fully explained.
Title: Scaling Data-Constrained Language Model
Abstract: Extrapolating scaling trends suggest that training dataset size for LLMs may soon be limited by the amount of text data available on the internet. In this talk we investigate scaling language models in data-constrained regimes. Specifically, we run a set of empirical experiments varying the extent of data repetition and compute budget. From these experiments we propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we discuss and experiment with approaches for mitigating data scarcity.
Title: The Uneasy Relation Between Deep Learning and Statistics
Abstract: Deep learning uses the language and tools of statistics and classical machine learning, including empirical and population losses and optimizing a hypothesis on a training set. But it uses these tools in regimes where they should not be applicable: the optimization task is non-convex, models are often large enough to overfit, and the training and deployment tasks can radically differ.
In this talk I will survey the relation between deep learning and statistics. In particular we will discuss recent works supporting the emerging intuition that deep learning is closer in some aspects to human learning than to classical statistics. Rather than estimating quantities from samples, deep neural nets develop broadly applicable representations and skills through their training.
The talk will not assume background knowledge in artificial intelligence or deep learning.
The panelists are:
Marzyeh Ghassemi (MIT),
James Zou (Stanford),
Ernest Davis (NYU), and
Nisheeth Vishnoi (Yale)