Joint TILOS and OPTML++ Seminar

This seminar is dedicated to work at the intersection of optimization and machine learning, while keeping an eye out for wider connections to related areas (e.g., statistics, signal processing, robotics, information theory, functional analysis, geometry, etc.; this wider net is where the "++" in the name comes from). OPTML++ seminars will cover both novel developments as well as fundamental concepts.

This semester the seminar takes place every other Wednesday at 4pm ET (with a few exceptions). You can also find recordings of most past talks in the "Past Talks" section.

Announcements (and Zoom link) related to this group are distributed on our mailing list. If possible, please use your institutional email when joining.

Next Talk

Wednesday, Feb 1, 2023 at 4pm ET

Title: The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

Mufan (Bill) Li and Daniel Roy. University of Toronto

Abstract. The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

Mufan (Bill) Li

Mufan (Bill) Li is a PhD candidate in the Department of Statistical Sciences at the University of Toronto, supervised by Daniel Roy and Murat Erdogdu. His work has been recognized by a MITACS Accelerate Fellowship, four Ontario Graduate Scholarships, and a Student Presentation Award at the 2021 Statistical Society of Canada (SSC) meeting. Mufan’s research is primarily focused on deep learning theory and, in particular, the study of infinite-depth-and-width limits, as well as on sampling algorithms based on Langevin diffusion.

Website: https://mufan-li.github.io/

Daniel Roy

Daniel Roy is an Associate Professor in the Department of Statistical Sciences at the University of Toronto, with cross appointments in Computer Science and Electrical and Computer Engineering. He also holds a CIFAR Canada AI Chair at the Vector Institute. Roy's research spans machine learning, mathematical statistics, and theoretical computer science. Roy is a recipient of an NSERC Discovery Accelerator Award, Ontario Early Research Award, and a Google Faculty Research Award. Roy serves as an action editor for the Journal of Machine Learning Research and Transactions of Machine Learning Research, senior area chair for the International Conference on Learning Representations, and area chair for NeurIPS, ICML, COLT, and other ML conferences. Prior to joining Toronto, Roy was a Research Fellow of Emmanuel College and Newton International Fellow of the Royal Society and Royal Academy of Engineering, hosted by the University of Cambridge. Roy completed his doctorate in Computer Science at the Massachusetts Institute of Technology, where his dissertation was awarded the MIT EECS Sprowls Award, given to the top dissertation in computer science in that year.

Website: http://danroy.org/