One World Seminar Series on the

Mathematics of Machine Learning

The One World Seminar Series on the Mathematics of Machine Learning is an online platform for research seminars, workshops and seasonal schools in theoretical machine learning. The focus of the series lies on theoretical advances in machine learning and deep learning as a complement to the one world seminars on probability, on Information, Signals and Data (MINDS), on methods for arbitrary data sources (MADS), and on imaging and inverse problems (IMAGINE).

The series was started during the Covid-19 epidemic in 2020 to bring together researchers from all over the world for presentations and discussions in a virtual environment. It follows in the footsteps of other community projects under the One World Umbrella which originated around the same time.

We welcome suggestions for speakers concerning new and exciting developments and are committed to providing a platform also for junior researchers. We recognize the advantages that online seminars provide in terms of flexibility, and we are experimenting with different formats. Any feedback on different events is welcome.

Next Event

Wed 23 July

12pm noon ET

Pierfrancesco Beneventano

Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD

Recent findings by Cohen et al., 2021, demonstrate that when training neural networks with full-batch gradient descent with step size $\eta$, the largest eigenvalue~$\lambda_{\max}$ of the full-batch Hessian consistently stabilizes at $\lambda_{\max}=2/\eta$. These results have significant implications for convergence and generalization. This, however, is not the case of mini-batch stochastic gradient descent (SGD), limiting the broader applicability of its consequences. We show that SGD trains in a different regime we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at $2/\eta$ is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence, $\lambda_{\max}$--which is generally smaller than Batch Sharpness--is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for mathematical modeling of SGD trajectories.

Zoom link: https://sfu.zoom.us/j/89334355925

Mailing List and Google Calendar

Sign up here to join our mailing list and receive announcements. If your browser automatically signs you into a google account, it may be easiest to join on a university account by going through an incognito window. With other concerns, please reach out to one of the organizers.

Format

Seminars are held online on Zoom. The presentations are recorded and video is made available on our youtube channel. A list of past seminars can be found here. All seminars, unless otherwise stated, are held on Wednesdays at 12 noon ET. The invitation will be shared on this site before the talk and distributed via email.

Board

Ricardo Baptista (University of Toronto)
Wuyang Chen (Simon Fraser University)
Bin Dong (Peking University)
Lyudmila Grigoryeva (University of St. Gallen)
Boumediene Hamzi (Caltech)

Yuka Hashimoto (NTT)
Qianxiao Li (National University of Singapore)
Lizao Li (Google)
George Stepaniants (Caltech)
Zhiqin Xu (Shanghai Jiao Tong University)

Former Board Members

Simon Shaolei Du (University of Washington)

Franca Hoffmann (Caltech)

Surbhi Goel (Microsoft Research NY)

Issa Karambal (Quantum Leap Africa)

Tiffany Vlaar (University of Glasgow)

Chao Ma (Stanford University)

Song Mei (UC Berkeley)

Philipp Petersen (University of Vienna)

Matthew Thorpe (University of Warwick)

Stephan Wojtowytsch (University of Pittsburgh)

Page updated

Google Sites

Report abuse