Evaluating the Credible use of Scientific Machine Learning for High Consequence Applications
Erin C.S. Acquesta, Sandia National Lab
Video Recording
Abstract:
Machine-learned models are increasingly being used in lieu of, to complement, or as surrogates for classic computational models. The emerging field of scientific machine learning (SciML) seeks to fuse traditional mathematical modeling with advances in machine learning to handle challenges such as the implementation of numerical solvers, model-form error estimations, and the computational expense of high-fidelity models. SciML models balance mechanist equations with data-driven inference, resulting in computational models that preserve scientific knowledge while readily adapting to the unknown through data-driven discovery. The practical adoption of SciML is evident by its impact in a variety of domains that include climate, epidemiology, turbulence, quantum mechanics, and biology. The integration of data-driven methods with mechanistic models has highlighted the need to evaluate the credibility of the resulting SciML models used for high-consequence decisions.
In this presentation we will articulate why we care about credibility and review the Sandia framework known as the Predictive Capability Maturity Model (PCMM) currently used for evaluating classic computational models. We will further the discussion of SciML credibility with an example of neural network (NN) function approximations of model-form error with application to epidemiology. A valuable tool to reduce model discrepancy, NN approximations to model-form error introduce new challenges with evaluating core mathematical modeling principles in verification, validation, and uncertainty quantification (VVUQ). To mitigate these risks, we propose that the credible use of SciML models may be applied as surrogates for high fidelity heterogenous stochastic models (e.g., agent-based models) where we preserve scientific knowledge while calibrating chaotic systems of human behavior through data-driven discovery.
Ultimately, this work seeks to motivate the scientific modeling community and decision makers to ask the question: When do the benefits of using the SciML framework outweigh the challenges of evaluating its credibility?
Bio:
Erin C.S. Acquesta is a Mathematician and Principal Member of the Technical Staff with the Applied Information Sciences Center at Sandia National Laboratories. She received her MS and PhD from North Carolina State University in the field of applied mathematics. Her research areas of interest include scientific machine learning, uncertainty quantification, sensitivity analysis, and machine learning explainability; with an emphasis on providing credible, adaptive, and interpretable modeling capabilities for enhanced situational awareness in support of national security decision-making. Her primary domain area of expertise focuses on the mathematical properties of infectious disease models. As an area of professional service, she is a member of the ASME VVUQ standards committee, writing definitions for verification, validation, uncertainty quantification, and credibility for machine-learned models. She also volunteers her time for educational outreach as a mentor and head referee for the Albuquerque VEX VRC Robotics League and NM State VEX VRC Robotics Competitions.
Summary:
Credibility for computational models
Ways of establishing credibility
Expert experience
Conservative margins
High quality models
Long track record of success
Verification: are we solving equations correctly?
Code (buggy?)
Solution (numerical/discretization error?)
Validation: Are we solving the right equation?
Compare model to experiment
Uncertainty quantification: how large is the uncertainty in the result
Epistemic: lack of knowledge (reducible)
Aleatory: inherent randomness (irreducible)
PCMM: Predictive Capability Maturity Model (used in Sandia)
Representation and Geometric Fidelity (how accurately is the physical object represented within the model?)
Physics Models
Code Verification/Quality Assurance
Solution Verification
Validation
Uncertainty Quantification
Maturity ranked at different levels 0-3
Higher ranks involve more rigorous checks, appropriate for higher consequence decisions when the model is the primary or only source of information to base the decision on
UQ is just one step of the process
Entire process needs to meet the respective maturity level associated with the application context
UQ doesn’t address: Right problem? Correct physics? Code bugs? Numerical errors?
Adapting Credibility process for Scientific ML
Examples
Operator Learning (e.g. PINN)
ML System Identification (e.g. Neural ODEs)
Model-Form Error Correction (e.g. Universal Differential Equations)
PCCM for ML
Data Representation
Domain Awareness
Code/Solution Evaluation
Interpretability/Explainability
Validation (unchanged)
Uncertainty Quantification (unchanged)
SciML for Epidemiology
Classic Compartmental Model
SIR Model: System of ODEs
S: Susceptible population
I: Infected population
R: Recovered population
Rates at which members of one population transform to another
Challenge: model form error
SIR model is simple but misses many details of real infection processes
Notional ground truth model (unknown, possibly many unmeasurable parameters)
Can presumably represent as some ODE/PDE
Can use difference between our actual model’s prediction and data as estimate of model-form error
Datasets are a finite-size sample of the real system’s true behavior
Need many samples to get a good estimate of the mean behavior of the system
Samples can themselves be biased on the way they’re selected
e.g. different sub-populations come into the hospital at different rates when they have the same symptoms
Need to adjust for such systematic biases and be very clear about it
UDEs for Epidemiology Compartmental Models
Adding Q (Quarantine) state to SIR model
Original ODE was incomplete
This captures key dynamics
Use of ML to capture these creates new challenges for credibility
Use densely connected neural network for dynamics of Q term
How we understand our confidence in the Q(t) model?
Observational data is limited
Approach using synthetic data:
Generated synthetic data from pre-specified NN with nominal parameter values
Learn optimal parameters from subsets of observable states
Evaluate whether this was actually successful (whether learning process can work with limited data)
Evaluation was successful: correct values of model parameters were recovered with limited datasets
Extreme case: train model using only the terminal state of the simulation (the recovered population at the end)
This was not successful because there are too many ways to get to the same final state
The uncertainty range of possible model parameters is wide and sometimes multi-modal
Bayesian UDE study:
Synthetic data is from SIRQ model
Complex posterior structure: different parameters are correlated with each other
Forecasting validation study highlights the need for likelihood estimations represented by stochastic processes
UDEs as Approximations to Stochastic Models
Sandia’s Adaptive Recovery Model (ARM): https://www.osti.gov/biblio/1684646
Network-based model of disease flow
More detailed than ODE, less than agent-based
Captures variability across communities, household types
Model has many states to capture, infection, recovery, symptomatic, etc.
Adding a neural Quarantine term to model
Optimal parameters of model shift as a result of the correction
Calibration becomes hierarchical: base model and theML-based correction