FOCS (virtual) workshop on
Recent Directions in Machine Learning

Part of 62nd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2021)

February 7-8, 2022


This workshop aims to survey some of the recent advances in machine learning (ML). The focus of the invited talks will be on areas of ML that have received less attention from the theoretical computer science community so far but have a potential for benefiting from more theory work.


The workshop is virtual and will take place over Zoom. FOCS registration is required for all participants. The Zoom link will be made available to registered participants.


Recorded talks on YouTube

Schedule

All times are in Eastern Standard Time (EST), which is UTC-5.

Monday, February 7

  • 12:00pm - 12:45pm: Cynthia Rush (Columbia), "Exact High-Dimensional Asymptotics for Statistical Estimation via Approximate Message Passing"

  • 12:45pm - 1:30pm: Risi Kondor (U. of Chicago), "Mathematical aspects of equivariant neural networks"

  • 5:00pm - 5:45pm: Anima Anandkumar (Caltech), "Neural Operator: Learning in Infinite Dimensions"

Tuesday, February 8

  • 12:00pm - 12:45pm: Andrej Risteski (CMU), "Langevin diffusion as an algorithmic and representational lens on probabilistic generative models"

  • 5:00pm - 5:45pm: Christopher Manning (Stanford), Distributional Learning of Distributed Representations of Human Language: From LSA to GPT-3

  • 5:45pm - 6:30pm: Jacob Steinhardt (UC Berkeley), "Nonparametrics and Robustness: Some Empirical Motivations for Theoretical Questions"

Abstracts

Speaker: Cynthia Rush (Columbia)
Title: Exact High-Dimensional Asymptotics for Statistical Estimation via Approximate Message Passing
Abstract: Approximate Message Passing (AMP) refers to a class of iterative algorithms that have been successfully applied to a number of high-dimensional statistical estimation problems like linear regression, generalized linear models, and low-rank matrix estimation, and a variety of engineering and computer science applications such as imaging, communications, and deep learning. AMP algorithms have two features that make them particularly attractive: they can easily be tailored to take advantage of prior information on the structure of the signal, such as sparsity, and under suitable assumptions on a design matrix, AMP theory provides precise asymptotic guarantees for statistical procedures in the high-dimensional regime. In this talk, I will present the main ideas of AMP from a statistical perspective to illustrate the power and flexibility of the AMP framework in establishing exact high-dimensional asymptotics in statistical estimation.

Speaker: Risi Kondor (U. of Chicago)
Title: Mathematical aspects of equivariant neural networks
Abstract: TBA

Speaker: Anima Anandkumar (Caltech)
Title: Neural Operator: Learning in Infinite Dimensions
Abstract: TBA

Speaker: Andrej Risteski (CMU)
Title: Langevin diffusion as an algorithmic and representational lens on probabilistic generative models
Abstract: Despite the proliferation of different families of probabilistic generative models in recent years (e.g. variational autoencoders, generative adversarial networks, normalizing flows, energy-based models), we still lack a thorough understanding of the statistical and algorithmic complexity of learning such models, as well as performing inference (i.e. drawing samples and estimating marginals).
In this talk, we will talk about Langevin diffusion as an emerging tool to understand questions in inference, as well as the representational power of generative models. On the former front, we will talk about recent results on sampling from distributions given up to a constant of proportionality (e.g. posteriors in latent variable models) beyond the standard assumption of log-concavity. On the latter front, we will talk about approximating distributions using well-conditioned normalizing flows, as well as characterizing the representational complexity of encoders in VAEs.
Based on joint works with Rong Ge, Holden Lee, Chirag Pabbaraju, Divyansh Pareek and Anish Sevekari.

Speaker: Christopher Manning (Stanford)
Title: Distributional Learning of Distributed Representations of Human Language: From LSA to GPT-3
Abstract: In Natural Language Processing, the long dominant approach to encode the structure of human languages in systems for various downstream tasks was by building context-free grammar or richer parsers from hand-annotated morphosyntactic resources that display linguistic structure, that is, treebanks. However, recent deep learning language models hark back to the different tradition of Latent Semantic Analysis (LSA). Language models are simply large artificial neural networks trained in a self-supervised fashion to predict a word in a given context. Nevertheless, once fine-tuned, these models now yield much better task performance, seemingly without any structural knowledge. I will begin with LSA and neural word embedding models. Then I consider recurrent neural networks and introduce the notion of bounded hierarchical languages, showing that RNNs can generate such languages with optimal memory. I then examine how deep contextual language models like BERT or GPT-3 learn knowledge of linguistic structure because it helps them in word prediction. By a method of syntactic probing, I show how components in these models focus on human language syntax, capturing grammatical relationships and anaphoric coreference. These results both help explain why recent neural models have brought such large improvements across many language-understanding tasks and provide intriguing hints about the possibility of learning language from observed evidence alone, as human children appear to do.

Speaker: Jacob Steinhardt (UC Berkeley)
Title: Nonparametrics and Robustness: Some Empirical Motivations for Theoretical Questions
Abstract: Many recent theoretical analyses of neural networks proceed by approximating deep neural networks by overparameterized *linear* models. We will examine the theory of these overparameterized models, and tie them to qualitative phenomena of deep networks. It turns out that linear models can often predict these qualitative phenomena--but only if we are careful in how we set things up.