The Schedule

The schedule is also available on the NeurIPS virtual platform.
Accepted papers are available on OpenReivew.

8:50 a.m. - 9:00 a.m.

Opening Remarks

9:00 a.m. - 9:45 a.m.

Invited Talk: From algorithms to neural networks and back

Speaker: Andrej Risteski

Abstract: An increasingly common design and analysis paradigm for neural networks is thinking of them as parametrizing (implicitly or explicitly) some algorithm. In images, score-based generative models can be thought of as parametrizing a learned sampler (a stochastic differential equation or a Markov Chain). In scientific applications, PDE solvers are trained as neural analogues of numerical solvers. In language, we probe to understand whether transformers can solve simple algorithmic tasks like parsing. In this talk, I’ll share several vignettes illustrating the value of an algorithmic lens in these settings: namely, understanding the performance of “natural” algorithms will allow us to understand the performance of neural methods, as well as explore and elucidate the architectural design space.

9:45 a.m. - 10:30 a.m. 

Invited Talk: How do two-layer neural networks learn complex functions from data over time?

Speaker: Florent Krzakala

Abstract: How do two-layer neural networks learn complex functions from data over time? In this talk, we shall delve into the interaction between batch size, number of iterations, and task complexity, shedding light on neural network adaptation to data features. I will particularly highlight three key findings:


Our theoretical approach combines techniques from statistical physics, concentration of measure, projection-based conditioning, and Gaussian equivalence, which we believe holds standalone significance.


Based on joint work with Yatin Dandi, Bruno Loureiro, Luca Pesce, and Ludovic Stephan (https://arxiv.org/pdf/2305.18270.pdf)

10:30 a.m. - 10:40 a.m. 

Oral: Feature Learning in Infinite-Depth Neural Networks

Greg Yang · Dingli Yu · Chen Zhu · Soufiane Hayou

10:40 a.m. - 10:50 a.m. 

Oral: Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions 

Yilong Qin · Andrej Risteski

10:50 a.m. - 11:00 a.m. 

Oral: Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models 

Song Mei · Yuchen Wu

11:00 a.m. - 11:10 a.m. 

Oral: Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

Zhiwei Xu · Yutong Wang · Spencer Frei · Gal Vardi · Wei Hu

11:10 a.m. - 12:10 p.m. 

Poster Session 1

12:10 p.m. - 1:15 p.m. 

Lunch Break

1:15 p.m. - 2:00 p.m. 

Invited Talk: Benefits of learning with symmetries: eigenvectors, graph representations and sample complexity

Speaker: Stefanie Jegelka

Abstract: In many applications, especially in the sciences, data and tasks have known invariances. Encoding such invariances directly into a machine learning model can improve learning outcomes, while it also poses challenges on efficient model design.

In the first part of the talk, we will focus on the invariances relevant to eigenvectors and eigenspaces being inputs to a neural network. Such inputs are important, for instance, for graph representation learning or orthogonally equivariant learning. We will discuss targeted architectures that can universally express functions with the relevant invariances or equivariances - sign flips and changes of basis - and their theoretical and empirical benefits.

Second, we will take a broader theoretical perspective. Empirically, it is known that encoding invariances into the machine learning model can reduce sample complexity. For the simplified setting of kernel ridge regression or random features, we will discuss new bounds that illustrate two ways in which invariances can reduce sample complexity. Our results hold for learning on manifolds and for invariances to a wide range of group actions.


This talk is based on joint work with Joshua Robinson, Derek Lim, Behrooz Tahmasebi, Lingxiao Zhao, Tess Smidt, Suvrit Sra and Haggai Maron.

2:00 p.m. - 2:15 p.m. 

Break

2:15 p.m. - 3:00 p.m.

Invited Talk: Adaptivity in Domain Adaptation and Friends

Speaker: Samory Kpotufe

Abstract: Domain adaptation, transfer, multitask, meta, few-shots, or lifelong learning … these are all important recent directions in ML that all touch at the core of what we might mean by ‘AI’. As these directions all concern learning in heterogeneous and ever-changing environments, they all share a central question: what information a 'source' distribution may have about a 'target' distribution, or put differently, which measures of discrepancy between distributions properly model such information.  


Our understanding of this central question is still rather fledgeling, with both positive and negative results. On one hand we show that traditional notions of distance and divergence between distributions (e.g., Wasserstein, TV, KL, Renyi) are in fact too conservative: a source may be 'far' from a target under such traditional notions, yet still admit much useful information about the target distribution. We then turn to the existence of 'adaptive' procedures, i.e., procedures which can optimally leverage such information in the source data without any prior distributional knowledge. Here the picture is quite nuanced: while various existing approaches turn out to be adaptive in usual settings with a single source and hypothesis class, no procedure can guarantee optimal rates adaptively in more general settings, e.g., settings with multiple source datasets (as in multitask learning), or settings with multiple hypothesis classes (as in model selection or hyper-parameter tuning). 


Such negative results raise new questions, as they suggest that domain adaptation and related problems may benefit from more structure in practice than captured by current formalisms. 


The talk is based on joint work with collaborators over the last few years, namely, G. Martinet, S. Hanneke, J. Suk, Y. Mahdaviyeh, N. Galbraith. 

3:00 p.m. - 3:10 p.m. 

Oral: Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon · Lornzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan

3:10 p.m. - 3:20 p.m. 

Oral: Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP 

Zixiang Chen · Yihe Deng · Yuanzhi Li · Quanquan Gu

3:20 p.m. - 3:30 p.m. 

Oral: In-Context Convergence of Transformers 

Yu Huang · Yuan Cheng · Yingbin Liang

3:30 p.m. - 3:40 p.m. 

Oral: Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study 

Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun

3:40 p.m. - 3:50 p.m. 

Oral: Linear attention is (maybe) all you need (to understand transformer optimization) 

Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra

3:50 p.m. - 4:00 p.m. 

Closing Remarks

4:00 p.m. - 5:00 p.m. 

Poster Session 2

List of Papers in Poster Session 1

List of Papers in Poster Session 2