BIRS 2024: 5-day Workshop (24w5301)

Structured Machine Learning and Time–Stepping for Dynamical Systems

February 19-23, 2024

The BIRS 5-day workshop is a satellite event to this PIMS-CRG. The goal is to bring together experts to exchange research and foster collaboration on structure-preserving discretizations and machine learning techniques for dynamical systems. 

Organizers:

BIRS Workshop 24w5301

Detailed Talk Schedule: (All in-person talks are held in TCPL 201)

Monday, February 19

09:00-09:45   Emil Constantinescu (Argonne National Laboratory)

Subgrid-Scale Operators with Neural Ordinary Differential Equations

We discuss a new approach to learning the subgrid-scale model effects when simulating partial differential equations (PDEs) solved by the method of lines and their representation in chaotic ordinary differential equations, based on neural ordinary differential equations (NODEs). Solving systems with fine temporal and spatial grid scales is an ongoing computational challenge, and closure models are generally difficult to tune. Machine learning approaches have increased the accuracy and efficiency of computational fluid dynamics solvers. In this approach neural networks are used to learn the coarse- to fine-grid map, which can be viewed as subgrid scale parameterization. We propose a strategy that uses the NODE and partial knowledge to learn the source dynamics at a continuous level. Our method inherits the advantages of NODEs and can be used to parameterize subgrid scales, approximate coupling operators, and improve the efficiency of low-order solvers. Numerical results using the two-scale Lorenz 96 equation, the convection-diffusion equation, and 2D/3D Navier--Stokes equations are used to illustrate this approach. 

09:45-10:30   Takaharu Yaguchi (Kobe University)

Numerical integrators for learning neural ordinary differential equation models

In recent years, neural network models for learning differential equation models from observed data of physical phenomena have been attracting attention. To train the neural network models it is often necessary to discretize the models using numerical integrators. In this talk, the effect of discretization in such cases is investigated. For example, I will explain that models become unidentifiable when general Runge-Kutta methods are used.

11:00-11:45   Lisa Kreusser (University of Bath)

Dynamical systems in deep generative modelling

Generative models have become very popular over the last few years in the machine learning community. These are generally based on likelihood based models (e.g. variational autoencoders), implicit models (e.g. generative adversarial networks), as well as score-based models. As part of this talk, I will provide insights into our recent research in this field focussing on score-based diffusion models which have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, link it to an associated Fokker–Planck equation, and provide a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker–Planck residual. We also show numerically that reducing the Fokker–Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.

13:30-14:15   HongKun Zhang (University of Massachusetts Amherst)

Machine learning of conservation laws for dynamical systems

This talk presents some recent work jointed with my  colleague Panayotis Kevrekidis and Wei Zhu.  I will discuss  machine learning techniques that we have successfully designed to learn the number of conservation laws, with applications to models including nonlinear lattice systems. 

14:15-15:00   Christian Offen (Paderborn University)

Learning Lagrangian dynamics from data with UQ

I will show how to use Gaussian Process regression to learn variational dynamical systems from data. The method is fitted into a GP framework by Chen et al for solving pdes such that uncertainty quantification and convergence of the method can be derived.

15:30-16:15   Brynjulf Owren (Norwegian University of Science and Technology)

Stability of numerical methods set on Euclidean spaces and manifolds with applications to neural networks

Stability of numerical integrators play a crucial role in approximating the flow of differential equations. Issues related to convergence and step size limitations have been successfully resolved by studying the stability properties of numerical schemes. Stability also plays a role in the existence and uniqueness to the solution of the nonlinear algebraic equations that need to be solved in each time step for an implicit method. However, very little has up to now been known about stability properties of numerical methods on manifolds, such as Lie group integrators. An interest in these questions has recently been sparked by the efforts in constructing ODE based neural networks that are robust against adversarial attacks. In this talk we shall discuss a new framework for B-stability on Riemannian manifolds. A method is B-stable if the numerical method exhibits a non-expansive behaviour in the Riemannian distance measure when applied to problems which have non-expansive solutions. We shall in particular see how the sectional curvature of the manifold plays a role, and show some surprising results regarding the non-uniqueness of geodesic implicit integrators for positively curved spaces. If time permits, we shall also discuss how to make use of the results in neural networks where the data belong to a Riemannian manifold.

16:15-17:00   Simone Brugiapaglia (Concordia University)

Practical existence theorems for deep learning approximation in high dimensions

Deep learning is having a profound impact on industry and scientific research. Yet, while this paradigm continues to show impressive performance in a wide variety of applications, its mathematical foundations are far from being well understood. Motivated by deep learning methods for scientific computing, I will present new practical existence theorems that aim at bridging the gap between theory and practice in this area. Combining universal approximation results for deep neural networks with sparse high-dimensional polynomial approximation theory, these theorems identify sufficient conditions on the network architecture, the training strategy, and the size of the training set able to guarantee a target accuracy. I will illustrate practical existence theorems in the contexts of high-dimensional function approximation via feedforward networks, reduced order modeling based on convolutional autoencoders, and physics-informed neural networks for high-dimensional PDEs.

Tuesday, February 20

09:00-09:45   Davide Murari (Norwegian University of Science and Technology)

Improving the robustness of Graph Neural Networks with coupled dynamical systems

Graph Neural Networks (GNNs) have established themselves as a key component in addressing diverse graph-based tasks, like node classification. Despite their notable successes, GNNs remain susceptible to input perturbations in the form of adversarial attacks. In this talk, we present a new approach to fortify GNNs against adversarial perturbations through the lens of coupled contractive dynamical systems.

09:45-10:30   Eldad Haber (University of British Columbia)

Time dependent graph neural networks

Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling complex relationships in graph-structured data. A recent innovation in this field is the family of Differential Equation-Inspired Graph Neural Networks (DE-GNNs), which leverage principles from continuous dynamical systems to model information flow on graphs with built-in properties such as feature smoothing or preservation. However, existing DE-GNNs rely on first or second-order temporal dependencies. In this talk, we propose a neural extension to those pre-defined temporal dependencies. We show that our model, called TDE-GNN, can capture a wide range of temporal dynamics that go beyond typical first or second-order methods, and provide use cases where existing temporal models are challenged. We demonstrate the benefit of learning the temporal dependencies using our method rather than using pre-defined temporal dynamics on several graph benchmarks.

11:00-11:45   Melanie Weber (Harvard University) - Virtual

Representation Trade-Offs in Geometric Machine Learning

The utility of encoding geometric structure, such as known symmetries, into machine learning architectures has been demonstrated empirically, in domains ranging from biology to computer vision. However, rigorous analysis of its impact on the learnability of neural networks is largely missing. A recent line of learning theoretic research has demonstrated that learning shallow, fully-connected neural networks, which are agnostic to data geometry, has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this talk, we ask, whether knowledge on data geometry is sufficient to alleviate the fundamental hardness of learning neural networks? We discuss learnability in several geometric settings, including equivariant neural networks, a class of geometric machine learning architectures that explicitly encode symmetries. Based on joined work with Bobak Kiani, Jason Wang, Thien Le, Hannah Lawrence, and Stefanie Jegelka.

13:30-14:15   Geoffrey McGregor (University of Toronto)

Conservative Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a prominent Markov Chain Monte Carlo algorithm often used to generate samples from a target distribution by evolving an associated Hamiltonian system using symplectic integrators. HMC’s improved sampling efficacy over traditional Gaussian random walk algorithms is primarily due to its higher acceptance probability on distant proposals, thereby reducing the correlation between successive samples more effectively and thus requiring fewer samples overall. Yet, thin high density regions can occur in high dimensional target distributions, which can lead to a significant decrease in the acceptance probability of HMC proposals when symplectic integrators are used. Instead, we introduce a variant of HMC called Conservative Hamiltonian Monte Carlo (CHMC), which utilizes a symmetric R-reversible second-order energy-preserving integrator to generate distant proposals with high probability of acceptance. We show that CHMC satisfies approximate stationarity with an error proportional to the integrator’s accuracy order. We also highlight numerical examples, with improvements in convergence over HMC, persisting even for large step sizes and narrowing widths of high density regions. This work is in collaboration with Andy Wan.

14:15-15:00   Wu Lin (Vector Institute)

(Lie-group) Structured Inverse-free Second-order Optimization for Large Neural Nets

Optimization is an essential ingredient of machine learning. Many optimization problems can be formulated from a probabilistic perspective to exploit the Fisher-Rao geometric structure of a probability family. By leveraging the structure, we can design new optimization methods. A classic approach to exploiting the Fisher-Rao structure is natural-gradient descent (NGD). In this talk, we show that performing NGD on a Gaussian manifold recovers Newton's method for unconstrained optimization, where the inverse covariance matrix is viewed as a preconditioning matrix. This connection allows us to develop (Lie-group) structured second-order methods by reparameterizing a preconditioning matrix and exploiting the parameterization invariance of natural gradients. We show applications where we propose structured matrix-inverse-free second-order optimizers and use them to train large-scale neural nets with millions of parameters in half precision settings.

15:30-16:15   Molei Tao (Georgia Institute of Technology)

Optimization and Sampling in Non-Euclidean Spaces

Machine learning in non-Euclidean spaces have been rapidly attracting attention in recent years, and this talk will give some examples of progress on its mathematical and algorithmic foundations. I will begin with variational optimization, which, together with delicate interplays between continuous- and discrete-time dynamics, enables the construction of momentum-accelerated algorithms that optimize functions defined on manifolds. Selected applications, namely a generic improvement of Transformer, and a low-dim. approximation of high-dim. optimal transport distance, will be described. Then I will turn the optimization dynamics into an algorithm that samples probability distributions on Lie groups. If time permits, the efficiency and accuracy of the sampler will also be quantified via a new, non-asymptotic error analysis.

16:15-17:00   Melvin Leok (University of California San Diego)

The Connections Between Discrete Geometric Mechanics, Information Geometry, Accelerated Optimization and Machine Learning

Geometric mechanics describes Lagrangian and Hamiltonian mechanics geometrically, and information geometry formulates statistical estimation, inference, and machine learning in terms of geometry. A divergence function is an asymmetric distance between two probability densities that induces differential geometric structures and yields efficient machine learning algorithms that minimize the duality gap. The connection between information geometry and geometric mechanics will yield a unified treatment of machine learning and structure-preserving discretizations. In particular, the divergence function of information geometry can be viewed as a discrete Lagrangian, which is a generating function of a symplectic map, that arise in discrete variational mechanics. This identification allows the methods of backward error analysis to be applied, and the symplectic map generated by a divergence function can be associated with the exact time-h flow map of a Hamiltonian system on the space of probability distributions. We will also discuss how time-adaptive Hamiltonian variational integrators can be used to discretize the Bregman Hamiltonian, whose flow generalizes the differential equation that describes the dynamics of the Nesterov accelerated gradient descent method.

Wednesday, February 21

09:00-09:45   Elena Celledoni (Norwegian University of Science and Technology)

Deep neural networks on diffeomorphism groups for optimal shape reparameterization

TBA

09:45-10:30   Giacomo Dimarco (University of Ferrara)

Control and neural network uncertainty quantification for plasma simulation

We will consider the development of numerical methods for simulating plasmas in magnetic confinement nuclear fusion reactors. In particular, we focus on the Vlasov-Maxwell equations describing out of equilibrium plasmas influenced by an external magnetic field and we approximate this model through the use of particle methods. We will additionally set an optimal control problem aiming at minimizing the temperature at the boundaries of the fusion device or alternatively the number of particles hitting the boundary. Our goal consists then in confining the plasma in the center of the physical domain. In this framework, we consider the construction of multifidelity methods based on neural network architectures for estimating the uncertainties due to the lack of knowledge of all the physical aspects arising in the modeling of plasma.

11:00-11:45   Bethany Lusch (Argonne National Laboratory)

Computationally Efficient Data-Driven Discovery and Linear Representation of Nonlinear Systems For Control

Linear dynamics are desirable for control, and Koopman theory offers hope of a globally linear (albeit infinite-dimensional) representation of nonlinear dynamics. However, it is challenging to find a good finite-dimensional approximation of the theoretical representation. I will present a deep learning approach with recursive learning that reduces error accumulation. The resulting linear system is controlled using a linear quadratic control. An illustrative example using a pendulum system will be presented with simulations on noisy data. We show that our proposed method is trained more efficiently and is more accurate than an autoencoder baseline.

Thursday, February 22

09:00-09:45   Chris Budd / Teo Deveney (University of Bath)

Adaptivity and expressivity in neural network approximations

We  consider the training of a Free Knot Spline (FKS) and a ReLU based NN to approximate regular and singular functions of a scalar variable over a fixed interval. Neural networks have a high theoretical expressivity, but the training of these with a natural choice of loss functions leads to non-convex problems where the resulting architecture is far from optimal and the ReLU NN with the usual loss function can give a poor approximation. Similar issues arise with a FKS, but here the training is more robust with a crucial role played by the best interpolating FKS. The latter can be trained more easily and then acts as a good starting point for training the FKS. This also gives insight into a better choice of loss function based on knot equidistribution  which allows a better calculation of the knots of the FKS. We look at the interplay between the training of a FKS and the expressivity of a FKS, with the training of formally equivalent shallow and a deep ReLU NNs.  The fact that we can train a FKS to achieve a high expressivity, even for a singular function by making use of the optimal interpolating FKS (with associated optimal knots), gives insights into how we can train a ReLU NN.

09:45-10:30   Yen-Hsi Tsai (University of Texas Austin)

Efficient gradient descent algorithms for learning from multiscale data

We will discuss a gradient descent based multiscale algorithm for minimizing loss functions arising from multiscale data distributions.

11:00-11:45   Michael Graham (University of Wisconsin Madison)

Data-driven modeling of complex chaotic dynamics on invariant manifolds

Fluid flows often exhibit chaotic or turbulent dynamics and require a large number of degrees of freedom for accurate simulation. Nevertheless, because of the fast damping of small scales by viscosity, these flows can in principle be characterized with a much smaller number of dimensions, as their long-time dynamics relax in state space to a finite-dimensional invariant manifold. We describe a data-driven reduced order modeling method, “Data-driven Manifold Dynamics” (DManD), that finds a nonlinear coordinate representation of the manifold using an autoencoder, then learns a system of ordinary differential equations for the dynamics on the manifold. Exploitation of symmetries substantially improves performance. We apply DManD to a range of systems including transitional turbulence, where we accurately represent the dynamics with 25 degrees of freedom, as compared to the 10^5 degrees of freedom of the direct simulation. We then use the model to efficiently train a reinforcement learning control policy that is highly effective in laminarizing the flow. We also introduce an autoencoder architecture that yields an explicit estimate of manifold dimension. DManD can be combined with a clustering algorithm to represent the invariant manifold as an atlas of overlapping local representations (charts). This approach, denoted CANDyMan (Charts and atlases for nonlinear data-driven dynamics on manifolds) enables minimal-dimensional representation of the dynamics and is particularly useful for systems with discrete symmetries or intermittent dynamics.

13:30-14:15   Seth Taylor (McGill University)

A spatiotemporal discretization for diffeomorphism approximation

We present a characteristic mapping method for the approximation of diffeomorphisms arising in fluid mechanics. The method utilizes a spatiotemporal discretization defined by a composition of sub-interval flows represented by spline interpolants. By leveraging the composite structure, exponentially fine scale fluid motions can be captured using only a linear increase in computational resources. We will explain how certain unique resolution properties of the method result from this discretization and the preservation of a relabelling symmetry for transported quantities. Numerical examples showcasing the ability to resolve the direct energy cascade at sub-grid scales and capture some associated inverse cascade phenomena for barotropic flows on the sphere will be given. This is joint work with Jean-Christophe Nave and Xi-Yuan Yin.

14:15-15:00   James Jackaman (Norwegian University of Science and Technology)

Limited area weather modelling: interpolating with neural networks

In this talk we discuss the use of neural networks as operators to interpolate finite element functions between mesh resolutions. These operators aim to improve regional weather predictions by capturing underresolved features near the model boundary from coarse global information.

15:30-16:15   Daisuke Furihata ( Osaka University)

A particle method based on Voronoi decomposition for the Cahn–Hilliard equation

We want to perform appropriate and fast numerical calculations using machine learning of the time-evolving operator of the Cahn-Hilliard equation. Still, in the context of FDM and FEM, the amount of machine learning becomes enormous. Therefore, we consider applying the Voronoi particle method and calculating particle behavior using machine learning.

16:15-17:00   David Ketchensen (King Abdullah University of Science and Technology)

Explicit time discretizations that preserve dissipative or conservative energy dynamics

Many systems modeled by differential equations possess a conserved energy or dissipated entropy functional, and preservation of this qualitative structure is essential to accurately capture their dynamics. Traditional methods that enforce such structure are usually implicit and expensive. I will describe a class of relatively cheap and explicit methods that can be used to accomplish this. I’ll also describe the extension to conserving multiple functionals and show several illustrative examples.

Friday, February 23

09:00-09:45   Kyriakos Flouris (ETH Zürich)

Geometry aware neural operators for hemodynamics

The standard approach for estimating hemodynamics parameters involves running CFD simulations on patient-specific models extracted from medical images.  Personalization of these models can be performed by integrating further data from MRA and Flow MRI.  While this technique can provide estimations for crucial parameters, such as wall shear stress (WSS), as well as its time and space averaged summaries, its implementation in clinical practice is hindered by the heavy computational load of CFD simulations and manual interventions required to obtain accurate geometric models, where simulations can be computed on. We aim to estimate hemodynamics parameters from flow and anatomical MRI, which can be routinely acquired in clinical practice.  The flow information and the geometry can be combined together in a computational mesh. Working directly on the wall a geometry aware graph-convolution-neural-network can be trained to predict the WSS given a computational domain and some flow information near the wall. However, for 4D MRI clinical data resolution can be prohibitively low for capturing WSS accurately enough. An ideal model will be able to faithfully upscale a Navier-stokes solution from the anatomical and flow clinical MRI. We investigate how neural operators can be coupled to geometry aware graph neural networks and the potential of geometry-informed and novel covariant neural operators for predicting the hemodynamic parameters from the clinical data.

09:45-10:30   Yolanne Lee (University College London)

Learning PDEs from image data using invariant features

Model discovery techniques which retrieve governing equations from data and a small amount of physical knowledge have emerged, including those based on genetic algorithms and symbolic regression, sparse regression, and, more recently, deep learning. However, the many complex systems may be inherently hard to simulate or measure experimentally, which results in limited data. We consider examples where the governing model is frame-independent, i.e. invariant under translation, rotation, and reflection, since knowledge of such invariances can be used to learn more efficiently. We propose a set of invariant features inspired by a classical multi-scale image feature analysis approach which can be implemented as a data pre-processing step, and investigate its impact on learning invariant models using PySR, SINDy, and the neural ODE. Using these invariant features is shown to improve the stability, learning time, and generalisability of the learned models.

11:00-11:45   Juntao Huang (Texas Tech University) - Virtual

Hyperbolic machine learning moment closures for kinetic equations

In this talk, we present our work on hyperbolic machine learning (ML) moment closure models for kinetic equations. Most of the existing ML closure models are not able to guarantee the stability, which directly causes blow up in the long-time simulations. In our work, with carefully designed neural network architectures, the ML closure model can guarantee the stability (or hyperbolicity). Moreover, other mathematical properties, such as physical characteristic speeds, are also discussed. Extensive benchmark tests show the good accuracy, long-time stability, and good generalizability of our ML closure model.