The 1W-MINDS Seminar was founded in the early days of the COVID-19 pandemic to mitigate the impossibility of travel. We have chosen to continue the seminar since to help form the basis of an inclusive community interested in mathematical data science, computational harmonic analysis, and related applications by providing free access to high quality talks without the need to travel. In the spirit of environmental and social sustainability, we welcome you to participate in both the seminar, and our slack channel community! Zoom talks are held on Thursdays at 2:30 pm New York time. To find and join the 1W-MINDS slack channel, please click here.
Current Organizers (September 2025 - May 2026): Ben Adcock (Simon Fraser University), March Boedihardjo (Michigan State University), Hung-Hsu Chou (University of Pittsburgh), Diane Guignard (University of Ottawa), Longxiu Huang (Michigan State University), Mark Iwen (Principal Organizer, Michigan State University), Siting Liu (UC Riverside), Kevin Miller (Brigham Young University), and Christian Parkinson (Michigan State University).
Most previous talks are on the seminar YouTube channel. You can catch up there, or even subscribe if you like.
To sign up to receive email announcements about upcoming talks, click here.
To join MINDS slack channel, click here.
Passcode: the smallest prime > 100
Modern machine learning and scientific computing pose optimization challenges of unprecedented scale and complexity, demanding fundamental advances in both theory and algorithmic design for nonconvex optimization. This talk presents recent advances that address these challenges by exploiting matrix and tensor structures, integrating adaptivity, and leveraging sampling techniques. In the first part, I introduce AdaGO, a new optimizer that combines orthogonalized momentum updates with adaptive learning rates. Building on the recent success of the Muon optimizer in large language model training, AdaGO incorporates an AdaGrad-type stepsize that scales orthogonalized update directions by accumulated past gradient norms. This design preserves the structural advantage of orthogonalized updates while adapting stepsizes to noise and the optimization landscape. We establish optimal convergence rates for smooth nonconvex functions and demonstrate improved performance over Muon and Adam on classification and regression tasks. The second part focuses on zeroth-order global optimization. We develop a theoretical framework for inexact proximal point (IPP) methods for global optimization, establishing convergence guarantees when proximal operators are estimated either deterministically or stochastically. The quadratic regularization in the proximal operator induces a concentrated Gibbs measure landscape that facilitates effective sampling. We propose two sampling-based algorithms: TT-IPP, which constructs a low-rank tensor-train (TT) approximation using a randomized TT-cross algorithm, and MC-IPP, which employs Monte Carlo integration. Both IPP algorithms adaptively balance efficiency and accuracy in proximal operator estimation, achieving strong performance across diverse benchmark functions and applications. Together, these works advance structure-aware adaptive first-order optimization for deep learning and zeroth-order global optimization in scientific computing.
Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations.