One World Mathematics of INformation, Data, and Signals (1W-MINDS) Seminar

The 1W-MINDS Seminar was founded in the early days of the COVID-19 pandemic to mitigate the impossibility of travel. We have chosen to continue the seminar since to help form the basis of an inclusive community interested in mathematical data science, computational harmonic analysis, and related applications by providing free access to high quality talks without the need to travel. In the spirit of environmental and social sustainability, we welcome you to participate in both the seminar, and our slack channel community! Zoom talks are held on Thursdays at 2:30 pm New York time. To find and join the 1W-MINDS slack channel, please click here.

Current Organizers (September 2025 - May 2026): Ben Adcock (Simon Fraser University), March Boedihardjo (Michigan State University), Hung-Hsu Chou (University of Pittsburgh), Diane Guignard (University of Ottawa), Longxiu Huang (Michigan State University), Mark Iwen (Principal Organizer, Michigan State University), Siting Liu (UC Riverside), Kevin Miller (Brigham Young University), and Christian Parkinson (Michigan State University).

Most previous talks are on the seminar YouTube channel. You can catch up there, or even subscribe if you like.

To sign up to receive email announcements about upcoming talks, click here.
To join MINDS slack channel, click here.

The organizers would like to acknowledge support from the Michigan State University Department of Mathematics. Thank you.

Zoom Link for all 2:30 pm Detroit-time Talks: zoom link

Passcode: the smallest prime > 100

FUTURE TALKS

Dec. 4: Minxin Zhang (UCLA)

Structure-Aware Adaptive Nonconvex Optimization for Deep Learning and Scientific Computing

Modern machine learning and scientific computing pose optimization challenges of unprecedented scale and complexity, demanding fundamental advances in both theory and algorithmic design for nonconvex optimization. This talk presents recent advances that address these challenges by exploiting matrix and tensor structures, integrating adaptivity, and leveraging sampling techniques. In the first part, I introduce AdaGO, a new optimizer that combines orthogonalized momentum updates with adaptive learning rates. Building on the recent success of the Muon optimizer in large language model training, AdaGO incorporates an AdaGrad-type stepsize that scales orthogonalized update directions by accumulated past gradient norms. This design preserves the structural advantage of orthogonalized updates while adapting stepsizes to noise and the optimization landscape. We establish optimal convergence rates for smooth nonconvex functions and demonstrate improved performance over Muon and Adam on classification and regression tasks. The second part focuses on zeroth-order global optimization. We develop a theoretical framework for inexact proximal point (IPP) methods for global optimization, establishing convergence guarantees when proximal operators are estimated either deterministically or stochastically. The quadratic regularization in the proximal operator induces a concentrated Gibbs measure landscape that facilitates effective sampling. We propose two sampling-based algorithms: TT-IPP, which constructs a low-rank tensor-train (TT) approximation using a randomized TT-cross algorithm, and MC-IPP, which employs Monte Carlo integration. Both IPP algorithms adaptively balance efficiency and accuracy in proximal operator estimation, achieving strong performance across diverse benchmark functions and applications. Together, these works advance structure-aware adaptive first-order optimization for deep learning and zeroth-order global optimization in scientific computing.

Dec. 11: Mariia Seleznova (LMU Munich)

Effects of Depth in Deep Learning: Independence vs Recurrence

Depth plays a central role in modern deep learning, yet its probabilistic effects are subtle and are not fully captured by classical theories that primarily focus on the infinite-width limit. This talk explores how jointly scaling depth and width shapes the signal-propagation statistics of wide neural networks under two contrasting regimes: fully connected feedforward networks with independent weights across layers, and recurrent networks with shared weights. In feedforward networks, standard infinite-width analyses allow to stabilize forward and backward variance, ensuring well-behaved initialization. However, finite-width fluctuations accumulate with depth, breaking convergence to the Neural Tangent Kernel (NTK) regime. In contrast, in linear recurrent networks, finite-width effects already destabilize the forward-propagation variance, rendering conventional initialization schemes inadequate for long input sequences. Together, these results show that depth affects feedforward and recurrent architectures in qualitatively distinct ways that cannot be captured by infinite-width approximations.

One World Mathematics of INformation, Data, and Signals (1W-MINDS) Seminar

Zoom Link for all 2:30 pm Detroit-time Talks: zoom link

FUTURE TALKS

Dec. 4: Minxin Zhang (UCLA)

Structure-Aware Adaptive Nonconvex Optimization for Deep Learning and Scientific Computing

Dec. 11: Mariia Seleznova (LMU Munich)

Effects of Depth in Deep Learning: Independence vs Recurrence

Jan. 8: Stephen Becker (University of Colorado Boulder)

TBD

Jan. 15: Nicholas Dexter (Florida State University)

TBD

Jan. 22: Stephan Wojtowytsch (University of Pittsburgh)

TBD

Jan. 29: Akram Aldroubi (Vanderbilt University)

TBD

Feb. 5: Jonas Latz (University of Manchester)

TBD

Feb. 12: Weilin Li (City University of New York)

TBD

Feb. 26: Matthieu Dolbeault (Brown University)

TBD