Operator Learning: From Theory to Practice

Abstract:
We present a general framework for approximating non-linear maps between infinite dimensional Banach spaces from observations. Our approach follows the "discretize last" philosophy by designing approximation architectures directly on the function spaces of interest without tying parameters to any finite dimensional discretization. Such architectures exhibit an approximation error that is independent of the training data discretization and can utilize data sources with diverse discretization common to many engineering problems. We review the infinite-dimensional approximation theory for such architectures, showing the universal approximation property and the manifestation of the curse of dimensionality translating algebraic rates in finite dimensions to exponential rates in infinite dimensions. We discuss efficient approximation of certain operators arising from parametric partial differential equations (PDEs) and show that efficient parametric approximation implies efficient data approximation. We demonstrate the utility of our framework numerically on a variety of large-scale problems arising in fluid dynamics, porous media flow, weather modeling, and crystal plasticity. Our results show that data-driven methods can provide orders of magnitude in computational speed-up at a fixed accuracy compared to classical numerical methods and hold immense promise in modeling complex physical phenomena across multiple scales.

Bio: Nik Kovachki is a research scientist at NVIDIA in the Learning and Perception group. His work focuses on understanding the connections between machine learning and scientific computing. Nik received a B.Sc. in mathematics from Caltech in 2016, and a Ph.D. in applied and computational mathematics from Caltech in 2022 under the supervision of Prof. Andrew Stuart. In 2025, Nik will join the mathematics faculty at the Courant Institute of Mathematical Sciences at New York University.

Summary

Focus: Operator learning and how it is useful for scientific computing
Motivation:
- Goal is modeling fluids, materials, weather
- Operator learning applications:
  - Speed up expensive simulations
  - Model unknown dynamics from data
Setting:
- Map between separable Banach spaces
- Function -> Function
- A function represents a mapping from some coordinate space to the state of the world at points in that space
- Goal is to approximate the function->function mapping to minimize approximation error
Example:
- Semi-discrete heat equation (discretized in time, not space)
  - Forward Euler formulation loses stability when represented in as function transform
  - Backward Euler is stable, choice of approximation parameters is independent of the parameters of the discretization
- Design of operator learning architecture has same type of design choice
  - Goal: discretization invariance:
    - decouple cost from discretization
    - use information at different discretizations
    - transfer learn across discretizations
Architectures and Approximation Theory
- Reduced order modeling:
  - Idea:
    - Encode original function into a low-dim approximation,
    - Transform this low-dim approximation to another function,
    - Then decode back into original representation
  - Goal: make the low-dim approximation/transform accurate: approximately commutes with original transform
  - Universal approximation is possible in this framework
- Non-linear instantiation:
  - Encode original function using PCA -> main eigenfunctions of input function
  - Decode via inverse PCA
  - Low-dim transform: neural net
  - Can prove that we can achieve any level of accuracy given enough eigenfunctions to decompose to via PCA
  - Challenge: for this linear transform the number of eigenfunctions needed grow exponentially; need a non-linear approximation
- Non-linear instantiation:
  - Sequence of neural layers
  - Kernel transforms data into reduced form
  - Apply linear weights
  - Push through non-linear operator (e.g. sigmoid, tanh is normal neural nets)
  - Approximation is more non-linear and efficient
  - Challenge: choice of kernel
    - Many kernels directly imply a data representation,
    - E.g. CNNs impose a specific grid
    - More flexible:
      - Transforms: fourier, circle harmonics, wavelets, Laplace-Beltrami
      - Adaptive meshing / multipole
      - Allow selective discretization that uses different levels of approximation in different spatial regions
  - Approximation
    - Can show that for each architecture there is some bad map that requires exponentially many parameters
    - So worst case is bad but what can we approximate successfully?
    - For each approximation method try to find the space of functions the method can approximate efficiently (polynomially many parameters)
    - Hard to characterize this space but can show it is non-empty
      - E.g. Navier-Stokes model of incompressible fluids
      - Can prove that approximating this requires polynomially many parameters
- Data complexity
  - Instantiate the framework to encoder that uses a differentiable function to sample the original transform and encode the data, then decode it
    - E.g. finite sampler
  - Can prove that in the worst case the number of samples required for an approximation grows exponentially
  - But can show that if the transformation approximation requires polynomially many parameters, the data approximator will need polynomially many samples
Applications
- 3D RANS Simulations
  - Training: 500 converged simulations, 5,000,000 Reynolds number
  - Map: inlet velocity to wall shear stress
  - Used Geometry-Informed Neural Operator to approximate simulation efficiently
- Weather modeling
  - Used ERA5 Reanalysis from ECMWF
  - 1979-2018 - 1 hour intervals
  - 721x1440 equiangular grid
  - Parameterized using spherical harmonics
  - Matches accuracy of physics model but with lower cost

Page updated

Report abuse