Applied and Computational Mathematics Seminar
Department of Mathematics and Statistics
Applied and Computational Mathematics Seminar
Department of Mathematics and Statistics
Date and time: Sep 05 at 2:00 pm (Parker 328)
Title: Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers
Abstract: When training deep neural networks, a model’s generalization error is often observed to follow a power scaling law dependent on the model size and the data size. A prominent example is transformer-based large language models (LLMs), where networks with billions of parameters are trained on trillions of tokens. A theoretical interest in LLMs is to understand why transformer scaling laws emerge. In this talk, we exploit low-dimensional structures in language datasets by estimating its intrinsic dimension and establish statistical estimation and mathematical approximation theories for transformers to predict the scaling laws. This perspective shows that transformer scaling laws can be explained in a manner consistent with the underlying data geometry. We further validate our theory with empirical observations of LLMs and find strong agreement between the observed empirical scaling laws and our theoretical predictions. Finally, we turn to in-context learning, analyzing its scaling behavior by uncovering a connection between the attention mechanism in transformers and classical kernel methods in machine learning.
Date and time: Sep 19 at 2:00 pm (Parker 328)
Title: On some foundational issues in feedback control
Abstract: The remarkable success of closed-loop control in mitigating the effect of uncertainty on a system’s performance has undoubtedly enabled much of the technological world around us. Indeed, feedback regulation can be found “under the hood” in the functioning of engines, the workings of biological organisms, interplanetary navigation, GPS tracking, robotics and more. While the mitigation of uncertainty has been at the heart of control theory since its inception, explicit control of uncertainty is a relatively recent development that has garnered much attention. In this, a main object of study is the Liouville (continuity) equation -- the PDE governing the evolution of the probability distribution of the state of a dynamical system. While it was widely believed that the basic question of controllability of the Liouville equation had been resolved, it escaped the community’s attention for almost two decades that early investigations on the subject came short of providing a satisfactory answer, even for linear systems. In this talk, we revisit and address this topic and develop a theory for Collective Steering, the endeavor to shepherd an ensemble of dynamical systems between desired configurations using a common feedback law. Our investigation sheds light on a topological obstruction at the heart of the issue that limits the ability to design feedback control laws that are globally continuous with respect to the specifications. Along the way, we touch upon an elegant geometric framework at the intersection of optimal transport, geometric hydrodynamics, and quantum mechanics.
Date and time: Sep 26 at 2:00 pm (Parker 328)
Title: Information Gamma Calculus: Convexity Analysis for Stochastic Differential Equations
Abstract: We study the Lyapunov convergence analysis for degenerate and non-reversible stochastic differential equations (SDEs). We apply the Lyapunov method to the Fokker–Planck equation, in which the Lyapunov functional is chosen as a weighted relative Fisher information functional. We derive a structure condition and formulate the Lyapunov constant explicitly. Given the positive Lyapunov constant, we prove the exponential convergence result for the probability density function towards its invariant distribution in the L1 norm. Several examples are presented: underdamped Langevin dynamics with variable diffusion matrices, quantum SDEs in Lie groups (Heisenberg group, displacement group, and Martinet sub-Riemannian structure), three oscillator chain models with nearest-neighbor couplings, and underdamped mean field Langevin dynamics (weakly self-consistent Vlasov–Fokker–Planck equations). If time is allowable, some extensions will be discussed on the time-inhomogeneous SDEs.