One World Mathematics of INformation, Data, and Signals (1W-MINDS) Seminar

The 1W-MINDS Seminar was founded in the early days of the COVID-19 pandemic to mitigate the impossibility of travel.  We have chosen to continue the seminar since to help form the basis of an inclusive community interested in mathematical data science, computational harmonic analysis, and related applications by providing free access to high quality talks without the need to travel.  In the spirit of environmental and social sustainability, we welcome you to participate in both the seminar, and our slack channel community!  Zoom talks are held on Thursdays either at 2:30 pm New York time or at 10:00 am Paris /4:00 pm summer Shanghai time/ 5:00 pm winter Shanghai time.  To find and join the 1W-MINDS slack channel, please click here.

Current Organizers (September 2023 - May 2024):  Axel Flinth (Principal Organizer for Europe/Asia, Umeå University), Longxiu Huang (Principal Organizer for The Americas, Michigan State University), Alex Cloninger (UC San Diego), Mark Iwen (Michigan State University), Santhosh Karnik (Michigan State University), Weilin Li (City College of New York), Yong Sheng Soh (National University of Singapore), and Yuying Xie (Michigan State University).

To sign up to receive email announcements about upcoming talks, click here.
To join MINDS slack channel, click here.


The organizers would like to acknowledge support from the Michigan State University Department of Mathematics.  Thank you.

Zoom Link for all 2:30 pm New York time Talks: New York link 

Passcode: the smallest prime > 100 

Zoom Link for all 10:00 am Paris/4:00 pm Summer Shanghai/5 pm Winter Shanghai time Talks: Paris/Shanghai link

Passcode: The integer part and first five decimals of e (Eulers number)

FUTURE TALKS

April 11 (POSTPONED) :  E Weinan (Peking University), 


Prof. E's talk has been postponed. A new time will be announced shortly

May 2:  Hung-Hsu "Edward" Chou (Technical Univeristy of Munich), 1:30 pm New York time (NOTE TIME CHANGE)

More is Less: Understanding Compressibility of Neural Networks via Implicit Bias and Neural Collapse

Despite their recent successes in various tasks, most modern machine learning algorithms lack theoretical guarantees, which are crucial to further development towards delicate tasks such as designing self-driving cars. One mysterious phenomenon is that, among infinitely many possible ways to fit data, the algorithms always find the "good" ones, even when the definition of "good" is not specified by the designers. In this talk I will cover the empirical and theoretical study of the connection between the good solutions in neural networks and the sparse solutions in compressed sensing with four questions in mind: What happens? When does it happen? Why does it happen? How can we improve it? The key concepts are implicit bias/regularization, Bregman divergence, neural collapse, and neural tangent kernel.

May 9:  Xiuyuan Cheng (Duke University), 2:30 pm New York time

Flow-based generative model by Wasserstein proximal gradient descent

Normalizing flow has become a popular class of deep generative models for efficient sampling and density estimation. Recently, the remarkable success of score-based diffusion models has inspired flow-based models closely related to the diffusion process. In particular, the celebrated Jordan-Kinderleherer-Otto (JKO) scheme captures the variational nature of the diffusion process as a Wasserstein gradient flow, and in the context of normalizing flow, this naturally suggests a progressive way of training a flow model that implements a proximal gradient descent in the Wasserstein-2 space. In this talk, we introduce such a JKO flow model that achieves competitive performance compared to existing diffusion and flow models on generating high dimensional real data. The proposed flow network stacks residual blocks one after another, where each block corresponds to a JKO step, and the overall flow model provides a deterministic invertible mapping from data to noise and in the reverse process from noise to data. On the theoretical side, the connection to Wasserstein proximal gradient descent allows us to prove the exponentially fast convergence of the discrete-time flow in both directions. Joint work with Yao Xie, Chen Xu (Georgia Tech), and Jianfeng Lu, Yixin Tan (Duke).

May 16Molei Tao (Georgia Tech), 2:30 pm New York time

Implicit Biases of Large Learning Rates in Machine Learning  


This talk will discuss some nontrivial but often pleasant effects of large learning rates, through the lens of nonlinear training dynamics. Large learning rates are commonly used in machine learning practice for improved empirical performances, but defy traditional theoretical analyses. I will first quantify how large learning rates help gradient descent escape local minima in multiscale landscape. This is via chaotic dynamics, which provides an alternative to the commonly known escape mechanism due to noises from stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers. Several related, perplexing phenomena have been empirically observed recently, including Edge of Stability, loss catapulting, and balancing. I will unify them and explain that they are all algorithmic implicit biases of large learning rates. These results are enabled by a new global convergence result of gradient descent, for certain nonconvex functions without Lipschitz gradient. This theory will also provide understanding of when there will be Edge of Stability and other large learning rate implicit biases.

TBD

TBD