The 1W-MINDS Seminar was founded in the early days of the COVID-19 pandemic to mitigate the impossibility of travel. We have chosen to continue the seminar since to help form the basis of an inclusive community interested in mathematical data science, computational harmonic analysis, and related applications by providing free access to high quality talks without the need to travel. In the spirit of environmental and social sustainability, we welcome you to participate in both the seminar, and our slack channel community! Zoom talks are held on Thursdays at 2:30 pm New York time. To find and join the 1W-MINDS slack channel, please click here.
Current Organizers (September 2025 - May 2026): Ben Adcock (Simon Fraser University), March Boedihardjo (Michigan State University), Hung-Hsu Chou (University of Pittsburgh), Diane Guignard (University of Ottawa), Longxiu Huang (Michigan State University), Mark Iwen (Principal Organizer, Michigan State University), Siting Liu (UC Riverside), Kevin Miller (Brigham Young University), and Christian Parkinson (Michigan State University).
Most previous talks are on the seminar YouTube channel. You can catch up there, or even subscribe if you like.
To sign up to receive email announcements about upcoming talks, click here.
To join MINDS slack channel, click here.
Passcode: the smallest prime > 100
Firstly, we study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-r matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank k matrices for some k ≤ r.
Secondly, we study the convergence properties of gradient descent for training deep linear neural networks, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step-sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
Thirdly, we study the convergence properties of mini-batch stochastic gradient descent (SGD) for training deep linear neural networks using a regularized squared loss. Under mild conditions we prove that SGD converges almost surely to the critical set of the regularised square loss function and that the iterates of this loss function converge almost surely. This is built on our derivation of an almost sure bound that controls the evolution of the SGD sequence.
The Beck-Fiala Conjecture asserts that any set system of n elements with degree k has combinatorial discrepancy O(√k). A substantial generalization is the Koml\'os Conjecture, which states that any m by n matrix with columns of unit Euclidean length has discrepancy O(1). In this talk, we describe an O'(log^1/4 n) bound for the Komlos problem, improving upon the O(log^1/2 n) bound due to Banaszczyk from 1998. We will also see how these ideas can be used to resolve the Beck-Fiala Conjecture for k >= \log^2 n, and give a O'(k^1/2 + log^1/2 n) bound for smaller k, which improves upon Banaszczyk's O(k^1/2 log^1/2 n) bound. These results are based on a new technique of "Decoupling via Affine Spectral Independence" in designing rounding algorithms, which might also be useful in other contexts.
This talk is based on joint work with Nikhil Bansal (University of Michigan).
Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations.