Boumediene Hamzi (Caltech / The Alan Turing Institute)
Title: Bridging Machine Learning, Dynamical Systems, and Algorithmic Information Theory: Insights from Sparse Kernel Flows, Poincaré Normal Forms and PDE Simplification
Abstract: This presentation delves into the intersection of Machine Learning, Dynamical Systems, and Algorithmic Information Theory (AIT), exploring the connections between these areas. In the first part, we focus on Machine Learning and the problem of learning kernels from data using Sparse Kernel Flows. We draw parallels between Minimum Description Length (MDL) and Regularization in Machine Learning (RML), showcasing that the method of Sparse Kernel Flows offers a natural approach to kernel learning. By considering code lengths and complexities rooted in AIT, we demonstrate that data-adaptive kernel learning can be achieved through the MDL principle, bypassing the need for cross-validation as a statistical method.
Transitioning to the second part of the presentation, we shift our attention to the task of simplifying Partial Differential Equations (PDEs) using kernel methods. Here, we utilize kernel methods to learn the Cole-Hopf transformation, transforming the Burgers equation into the heat equation. We argue that PDE simplification can also be seen as an MDL and a compression problem, aiming to make complex PDEs more tractable for analysis and solution. While these two segments may initially seem distinct, they collectively exemplify the multifaceted nature of research at the intersection of Machine Learning, Dynamical Systems, and AIT, offering preliminary insights into the synergies that arise when these fields converge.
Masaaki Imaizumi (The University of Tokyo / RIKEN AIP)
Title: Learning with Dynamics: Neural Network and High-Dimensional Inference
Abstract: We introduce several topics related to the connection between statistics, machine learning, and dynamical systems. The first topic concerns the learning of the XOR function by a neural network with simultaneous training. Feature learning, where the first layer of a multilayer neural network learns important structures from the data, has been recognized as a key advantage of deep networks. However, demonstrating this theoretically requires specific techniques, such as sequential learning algorithms. This study shows that it is possible to learn the XOR function even when both layers of a two-layer neural network are updated simultaneously. To establish this result, we characterize the fine-grained tracking of neuron variability, which differs from conventional dynamical analyses based on optimization. The second topic discusses statistical inference for high-dimensional parameters, specifically the evaluation of the uncertainty of estimators. Inference for high-dimensional parameters often employs a framework that derives distributions using limit theorems for dynamical algorithms. In this study, we extend this approach to single-index models, a representative example of nonlinear models, and demonstrate that statistical inference for high-dimensional parameters can be performed within this setting.
Jeroen Lamb (Imperial College London / International Research Center for Neurointelligence (IRCN), The University of Tokyo*)
Title: Learning random dynamical systems
Abstract: Random Random dynamical systems are dynamical systems driven by noise. These naturally arise in models of complex systems, and have many potentially important practical applications. We report on first results on the objective to learn random dynamical systems from (partial) observations. While the learning of differential equations or otherwise autonomous dynamical systems from (partial) observations is by now well-established, the same general task in the random setting is only just being initiated. We report on first results concerning a special class of RDS called iterated function systems, consisting of the random composition of a finite number of maps. This is joint work with Emilia Gibson (Imperial College London).
*Supported by Aihara Moonshot project, JST Moonshot R&D Grant Number JPMJMS2021
Takaharu Yaguchi (Kobe University)
Title: Model Reduction of Neural Operators by Infinite-Dimensional Singular Value Decomposition
Abstract: Neural operators are infinite-dimensional extensions of neural networks and are mainly used for learning solution operators of differential equations. For neural networks, model reduction methods using the singular value decomposition are well-known. The linear operators in the architectures of neural operators are Hilbert-Schmidt operators that admit infinite-dimensional singular value decomposition. This fact makes it possible to apply the singular value decomposition to derive simpler models for neural operators.
Takashi Matsubara (Hokkaido University)
Title: Deep Learning-based Modeling Inspired by Geometric Mechanics
Abstract: Recent remarkable advances in deep learning have been achieved not only through increased data and computational resources, but also by integrating various forms of prior knowledge. A notable example is the convolutional neural network, which exhibits translational invariance to an object's position in an image. In the context of modeling, such an approach is what we know as "gray-box modeling." Accordingly, when modeling dynamical systems by deep learning, incorporating insights from analytical mechanics, particularly geometric mechanics, can yield highly accurate and interpretable models. In this talk, beginning with Hamiltonian neural networks, I will summarize my research findings to date.
Cristopher Salvi (Imperial College London)
Title: Parallelizing Transformers via Flow Discretization
Abstract: I will discuss a novel framework for analyzing linear attention models through matrix-valued state space models (SSMs). The approach, dubbed Parallel Flows, provides a mathematically principled way to decouple temporal dynamics from implementation constraints, enabling independent analysis of critical algorithmic components: chunking, parallelization, and information aggregation. Central to this framework is the reinterpretation of chunking procedures as computations of the discretized flows governing system dynamics. As a concrete application, I will discuss the case of DeltaNet in a generalized low-rank setting. I will explain how Parallel Flows enable the design of simple, streamlined generalizations of hardware-efficient algorithms present in the literature, and provide new ones, inspired by rough paths techniques, with provably lower complexity.
Yusuke Tanaka (NTT)
Title: Neural Operator Learning for Hamiltonian and Dissipative PDEs
Abstract: Neural operators have gained much attention for accelerating physics simulations. However, they often suffer from capturing the laws of physics from a finite amount of data. We present a framework that can use energy conservation and dissipation laws as inductive biases for training neural operators. We introduce a regularizer inspired by energy-based theory of physics, where energy gradient flows are represented by functional derivatives. The use of functional calculus allows us to obtain the functional derivatives via automatic differentiation. We also provide experimental results for some Hamiltonian and dissipative PDEs.
Matthew Levine (Broad Institute of MIT/Harvard)
Title: CD-Dynamax: fast and flexible Bayesian inference of dynamical systems from noisy, irregular, and partially-observed timeseries data
Abstract: We present a classical Bayesian framework for inferring stochastic differential equations from noisy, irregularly-sampled, and partially-observed trajectories. We show how combining advances in data assimilation, auto-differentiation, and Bayesian inference create new opportunities for improvement in system identification. We show that we can leverage the same inference pipeline to obtain posterior distributions over drift functions, regardless of their model class (we present numerics using parametric models, semi-parametric models, neural networks, Gaussian Processes, and sparse dictionaries). We will then discuss challenges in uncertainty quantification arising from large model equivalence classes, and point to new work that can eliminate these equivalences in linear systems; time permitting, we will investigate how these ideas can extend to non-linear systems.