Bayesian Simulation-based Learning: using numerical models to quantify uncertainty in the subsurface

Abstract:
Our ability to predict the evolution of complex groundwater systems in Earth sciences is fundamental to a variety of disciplines relevant for society, ranging from groundwater resources and environmental issues to sustainable energy. For decades, such problems have been solved by calibrating or stochastically inverting a conceptual model of the subsurface to fit the datasets. Considerable efforts have been made to address uncertainty quantification in those predictions, vital for researchers and decision makers. Unfortunately, model calibration does not allow a realistic uncertainty quantification, whereas stochastic inversion is often computationally prohibitive. In this talk, I illustrate the use of Bayesian Simulation-based Learning as an adapted methodology to overcome those main shortcomings and provide predictions of subsurface systems under large prior uncertainty. If machine learning applications are growing in all domains of sciences, their applications to dynamic modelling of subsurface systems remain a challenge because of the lack of training samples. In this framework, a group of prior models is used to generate the data and the prediction of interests in order to derive a direct relationship through machine learning, between both types of variables. The relationship is then used to directly forecast the prediction for field-observed data. I will illustrate this methodology with various example related to hydrogeophysical investigations, groundwater systems and deep geothermal systems.

Bio:
Thomas Hermans obtained a master in civil engineering in Mining and Geology from the University of Liege (Belgium) in 2010. He started a PhD in Applied Geophysics at the same University under the supervision of Prof. Frédéric Nguyen during which he developed a stochastic geostatistical framework to integrate near-surface geophysical data into groundwater models. After his PhD obtained in 2014, he started working on the monitoring and prediction of shallow geothermal systems. He joined Stanford University in 2015 with a scholarship of the Belgian American Educational Foundation to pursue his research focussing on geophysical data integration and uncertainty quantification. During this period, he started working on machine learning approaches to elucidate complex relationships between data and predictions in subsurface systems. In 2017, he became a Professor at Ghent University in Hydrogeology and Applied Geophysics where he built a research group in hydrogeophysics, with a strong focus on uncertainty quantification. He is also director of the Master in Science in Sustainable Land Management.

Summary:

Focus: subsurface hydrology
Wellhead Protection Area (WPA)
- Wells produce ~50% of drinking water around the world
- Drill wells into groundwater
- Create a cone of depression of water flow from nearby subsurface into the well collection area
- Authorities establish a Wellhead Protection Area where activities are restricted to protect water quality
- Requires a way to identify well’s drainage area
Must represent subsurface geology
- Create boreholes that identify the layer structure at each depth
- Interpolate structure between boreholes
  - Leverage models of distribution of subsurface properties
  - Invert to infer unobserved subsurface structure given observed boreholes
- Inferred state is uncertain and not unique
- Models often fail to predict the range of uncertainty
- Additional datasets
  - Electrical resistivity of subsurface (affected by water content and quality)
  - Travel time of tracer materials from different surface locations to sensors in wells
Uncertainty quantification
- We are uncertain about unobserved parameters of the subsurface
- How does this uncertainty affect error/noise in model predictions
- Markov Chain Monte Carlo
  - Sampling technique to establish this uncertainty connection
  - But requires many model runs
  - Doesn’t scale to many parameters
  - Requires a tight handle on the space of unknown parameters and their constraints
- Alternative: Bayesian Simulation-Based Learning (BaSiL)
  - Goal: train a direct relationship between a predictor and a target
  - Bayesian and Based on Simulations
- Approach:
  - Use subsurface models to predict both observable quantities (tracer experiments) and unobservable outcomes (WPA areas)
  - Run with many different subsurface structured + random variations
  - Simulate forward to predict observable outcomes
    - Tracing experiments
    - WPA
  - Train a statistical model to relate observables and unobserved outcomes
  - Then reduce dimensionality via PCA
  - Apply Convergent Correlation Analysis (CCA) to establish non-linear pair-wise correlations
  - Outcome: distribution that relates observable measurement and unobservable predictions; Can use this to create a probability distribution for the WPA given tracer experiments
- Challenge: Depends on an accurate distribution of subsurface parameters (geologic layers)
Experimental Design
- Goal: identify the experiments required to minimize uncertainty about unobservable quantities (WPA)
- Given a set of possible experiments:
  - For each one we can use above model to predict how collecting this data to reduce uncertainty
  - Choose the most informative experiment
  - Can extend to various combinations of experiments to find the synergistic ones
- Challenge: hard to apply to sequential inference tasks because model needs to be retuned when new data is introduced
Geophysical Inversion
- Goal: Solving for unknown model parameters (what’s underground?)
- Typically done via various model inversion algorithms
- Here replaced with statistical learning
- Example: Surface Nuclear Magnetic Resonance
  - Sensitive to water content of subsurface
  - 12k dimensions in data space but very correlated
  - Can be reduced to 5 dimensions via PCA (keep 90% of variability)
  - Like above use CCA to correlate unknown model parameters to 5-dim measurements
  - UQ analysis results are close to MCMC but not perfect
    - Primary cause is that the inferred CCA model is not a perfect match for the dynamics of the original geological model on which it was trained
Sequential Optimization
- Iterative addition of experiments that reduce the uncertainty in unobserved parameters
- Optimum sequence of experiments is different for each situation, depends on the outcome of each experiment