Operator Learning: From Theory to Practice
Abstract:
We present a general framework for approximating non-linear maps between infinite dimensional Banach spaces from observations. Our approach follows the "discretize last" philosophy by designing approximation architectures directly on the function spaces of interest without tying parameters to any finite dimensional discretization. Such architectures exhibit an approximation error that is independent of the training data discretization and can utilize data sources with diverse discretization common to many engineering problems. We review the infinite-dimensional approximation theory for such architectures, showing the universal approximation property and the manifestation of the curse of dimensionality translating algebraic rates in finite dimensions to exponential rates in infinite dimensions. We discuss efficient approximation of certain operators arising from parametric partial differential equations (PDEs) and show that efficient parametric approximation implies efficient data approximation. We demonstrate the utility of our framework numerically on a variety of large-scale problems arising in fluid dynamics, porous media flow, weather modeling, and crystal plasticity. Our results show that data-driven methods can provide orders of magnitude in computational speed-up at a fixed accuracy compared to classical numerical methods and hold immense promise in modeling complex physical phenomena across multiple scales.
Bio: Nik Kovachki is a research scientist at NVIDIA in the Learning and Perception group. His work focuses on understanding the connections between machine learning and scientific computing. Nik received a B.Sc. in mathematics from Caltech in 2016, and a Ph.D. in applied and computational mathematics from Caltech in 2022 under the supervision of Prof. Andrew Stuart. In 2025, Nik will join the mathematics faculty at the Courant Institute of Mathematical Sciences at New York University.
Summary
Focus: Operator learning and how it is useful for scientific computing
Motivation:
Goal is modeling fluids, materials, weather
Operator learning applications:
Speed up expensive simulations
Model unknown dynamics from data
Setting:
Map between separable Banach spaces
Function -> Function
A function represents a mapping from some coordinate space to the state of the world at points in that space
Goal is to approximate the function->function mapping to minimize approximation error
Example:
Semi-discrete heat equation (discretized in time, not space)
Forward Euler formulation loses stability when represented in as function transform
Backward Euler is stable, choice of approximation parameters is independent of the parameters of the discretization
Design of operator learning architecture has same type of design choice
Goal: discretization invariance:
decouple cost from discretization
use information at different discretizations
transfer learn across discretizations
Architectures and Approximation Theory
Reduced order modeling:
Idea:
Encode original function into a low-dim approximation,
Transform this low-dim approximation to another function,
Then decode back into original representation
Goal: make the low-dim approximation/transform accurate: approximately commutes with original transform
Universal approximation is possible in this framework
Non-linear instantiation:
Encode original function using PCA -> main eigenfunctions of input function
Decode via inverse PCA
Low-dim transform: neural net
Can prove that we can achieve any level of accuracy given enough eigenfunctions to decompose to via PCA
Challenge: for this linear transform the number of eigenfunctions needed grow exponentially; need a non-linear approximation
Non-linear instantiation:
Sequence of neural layers
Kernel transforms data into reduced form
Apply linear weights
Push through non-linear operator (e.g. sigmoid, tanh is normal neural nets)
Approximation is more non-linear and efficient
Challenge: choice of kernel
Many kernels directly imply a data representation,
E.g. CNNs impose a specific grid
More flexible:
Transforms: fourier, circle harmonics, wavelets, Laplace-Beltrami
Adaptive meshing / multipole
Allow selective discretization that uses different levels of approximation in different spatial regions
Approximation
Can show that for each architecture there is some bad map that requires exponentially many parameters
So worst case is bad but what can we approximate successfully?
For each approximation method try to find the space of functions the method can approximate efficiently (polynomially many parameters)
Hard to characterize this space but can show it is non-empty
E.g. Navier-Stokes model of incompressible fluids
Can prove that approximating this requires polynomially many parameters
Data complexity
Instantiate the framework to encoder that uses a differentiable function to sample the original transform and encode the data, then decode it
E.g. finite sampler
Can prove that in the worst case the number of samples required for an approximation grows exponentially
But can show that if the transformation approximation requires polynomially many parameters, the data approximator will need polynomially many samples
Applications
3D RANS Simulations
Training: 500 converged simulations, 5,000,000 Reynolds number
Map: inlet velocity to wall shear stress
Used Geometry-Informed Neural Operator to approximate simulation efficiently
Weather modeling
Used ERA5 Reanalysis from ECMWF
1979-2018 - 1 hour intervals
721x1440 equiangular grid
Parameterized using spherical harmonics
Matches accuracy of physics model but with lower cost