Evaluating the Credible use of Scientific Machine Learning for High Consequence Applications

Erin C.S. Acquesta, Sandia National Lab
Video Recording

Abstract:
Machine-learned models are increasingly being used in lieu of, to complement, or as surrogates for classic computational models. The emerging field of scientific machine learning (SciML) seeks to fuse traditional mathematical modeling with advances in machine learning to handle challenges such as the implementation of numerical solvers, model-form error estimations, and the computational expense of high-fidelity models. SciML models balance mechanist equations with data-driven inference, resulting in computational models that preserve scientific knowledge while readily adapting to the unknown through data-driven discovery. The practical adoption of SciML is evident by its impact in a variety of domains that include climate, epidemiology, turbulence, quantum mechanics, and biology. The integration of data-driven methods with mechanistic models has highlighted the need to evaluate the credibility of the resulting SciML models used for high-consequence decisions.

In this presentation we will articulate why we care about credibility and review the Sandia framework known as the Predictive Capability Maturity Model (PCMM) currently used for evaluating classic computational models. We will further the discussion of SciML credibility with an example of neural network (NN) function approximations of model-form error with application to epidemiology. A valuable tool to reduce model discrepancy, NN approximations to model-form error introduce new challenges with evaluating core mathematical modeling principles in verification, validation, and uncertainty quantification (VVUQ). To mitigate these risks, we propose that the credible use of SciML models may be applied as surrogates for high fidelity heterogenous stochastic models (e.g., agent-based models) where we preserve scientific knowledge while calibrating chaotic systems of human behavior through data-driven discovery.

Ultimately, this work seeks to motivate the scientific modeling community and decision makers to ask the question: When do the benefits of using the SciML framework outweigh the challenges of evaluating its credibility?

Bio:
Erin C.S. Acquesta is a Mathematician and Principal Member of the Technical Staff with the Applied Information Sciences Center at Sandia National Laboratories. She received her MS and PhD from North Carolina State University in the field of applied mathematics. Her research areas of interest include scientific machine learning, uncertainty quantification, sensitivity analysis, and machine learning explainability; with an emphasis on providing credible, adaptive, and interpretable modeling capabilities for enhanced situational awareness in support of national security decision-making. Her primary domain area of expertise focuses on the mathematical properties of infectious disease models. As an area of professional service, she is a member of the ASME VVUQ standards committee, writing definitions for verification, validation, uncertainty quantification, and credibility for machine-learned models. She also volunteers her time for educational outreach as a mentor and head referee for the Albuquerque VEX VRC Robotics League and NM State VEX VRC Robotics Competitions.

Summary:

Credibility for computational models
- Ways of establishing credibility
  - Expert experience
  - Conservative margins
  - High quality models
  - Long track record of success
- Verification: are we solving equations correctly?
  - Code (buggy?)
  - Solution (numerical/discretization error?)
- Validation: Are we solving the right equation?
  - Compare model to experiment
- Uncertainty quantification: how large is the uncertainty in the result
  - Epistemic: lack of knowledge (reducible)
  - Aleatory: inherent randomness (irreducible)
- PCMM: Predictive Capability Maturity Model (used in Sandia)
  - Representation and Geometric Fidelity (how accurately is the physical object represented within the model?)
  - Physics Models
  - Code Verification/Quality Assurance
  - Solution Verification
  - Validation
  - Uncertainty Quantification
- Maturity ranked at different levels 0-3
  - Higher ranks involve more rigorous checks, appropriate for higher consequence decisions when the model is the primary or only source of information to base the decision on
- UQ is just one step of the process
  - Entire process needs to meet the respective maturity level associated with the application context
  - UQ doesn’t address: Right problem? Correct physics? Code bugs? Numerical errors?
Adapting Credibility process for Scientific ML
- Examples
  - Operator Learning (e.g. PINN)
  - ML System Identification (e.g. Neural ODEs)
  - Model-Form Error Correction (e.g. Universal Differential Equations)
- PCCM for ML
  - Data Representation
  - Domain Awareness
  - Code/Solution Evaluation
  - Interpretability/Explainability
  - Validation (unchanged)
  - Uncertainty Quantification (unchanged)
SciML for Epidemiology
- Classic Compartmental Model
  - SIR Model: System of ODEs
    - S: Susceptible population
    - I: Infected population
    - R: Recovered population
    - Rates at which members of one population transform to another
  - Challenge: model form error
    - SIR model is simple but misses many details of real infection processes
    - Notional ground truth model (unknown, possibly many unmeasurable parameters)
      - Can presumably represent as some ODE/PDE
    - Can use difference between our actual model’s prediction and data as estimate of model-form error
  - Datasets are a finite-size sample of the real system’s true behavior
    - Need many samples to get a good estimate of the mean behavior of the system
    - Samples can themselves be biased on the way they’re selected
      - e.g. different sub-populations come into the hospital at different rates when they have the same symptoms
      - Need to adjust for such systematic biases and be very clear about it
- UDEs for Epidemiology Compartmental Models
  - Adding Q (Quarantine) state to SIR model
    - Original ODE was incomplete
    - This captures key dynamics
    - Use of ML to capture these creates new challenges for credibility
  - Use densely connected neural network for dynamics of Q term
  - How we understand our confidence in the Q(t) model?
    - Observational data is limited
    - Approach using synthetic data:
      - Generated synthetic data from pre-specified NN with nominal parameter values
      - Learn optimal parameters from subsets of observable states
      - Evaluate whether this was actually successful (whether learning process can work with limited data)
      - Evaluation was successful: correct values of model parameters were recovered with limited datasets
      - Extreme case: train model using only the terminal state of the simulation (the recovered population at the end)
        This was not successful because there are too many ways to get to the same final state
        The uncertainty range of possible model parameters is wide and sometimes multi-modal
    - Bayesian UDE study:
      - Synthetic data is from SIRQ model
      - Complex posterior structure: different parameters are correlated with each other
      - Forecasting validation study highlights the need for likelihood estimations represented by stochastic processes
UDEs as Approximations to Stochastic Models
- Sandia’s Adaptive Recovery Model (ARM): https://www.osti.gov/biblio/1684646
- Network-based model of disease flow
  - More detailed than ODE, less than agent-based
  - Captures variability across communities, household types
- Model has many states to capture, infection, recovery, symptomatic, etc.
- Adding a neural Quarantine term to model
  - Optimal parameters of model shift as a result of the correction
- Calibration becomes hierarchical: base model and theML-based correction