Abstract:
Our ability to predict the evolution of complex groundwater systems in Earth sciences is fundamental to a variety of disciplines relevant for society, ranging from groundwater resources and environmental issues to sustainable energy. For decades, such problems have been solved by calibrating or stochastically inverting a conceptual model of the subsurface to fit the datasets. Considerable efforts have been made to address uncertainty quantification in those predictions, vital for researchers and decision makers. Unfortunately, model calibration does not allow a realistic uncertainty quantification, whereas stochastic inversion is often computationally prohibitive. In this talk, I illustrate the use of Bayesian Simulation-based Learning as an adapted methodology to overcome those main shortcomings and provide predictions of subsurface systems under large prior uncertainty. If machine learning applications are growing in all domains of sciences, their applications to dynamic modelling of subsurface systems remain a challenge because of the lack of training samples. In this framework, a group of prior models is used to generate the data and the prediction of interests in order to derive a direct relationship through machine learning, between both types of variables. The relationship is then used to directly forecast the prediction for field-observed data. I will illustrate this methodology with various example related to hydrogeophysical investigations, groundwater systems and deep geothermal systems.
Bio:
Thomas Hermans obtained a master in civil engineering in Mining and Geology from the University of Liege (Belgium) in 2010. He started a PhD in Applied Geophysics at the same University under the supervision of Prof. Frédéric Nguyen during which he developed a stochastic geostatistical framework to integrate near-surface geophysical data into groundwater models. After his PhD obtained in 2014, he started working on the monitoring and prediction of shallow geothermal systems. He joined Stanford University in 2015 with a scholarship of the Belgian American Educational Foundation to pursue his research focussing on geophysical data integration and uncertainty quantification. During this period, he started working on machine learning approaches to elucidate complex relationships between data and predictions in subsurface systems. In 2017, he became a Professor at Ghent University in Hydrogeology and Applied Geophysics where he built a research group in hydrogeophysics, with a strong focus on uncertainty quantification. He is also director of the Master in Science in Sustainable Land Management.
Summary:
Focus: subsurface hydrology
Wellhead Protection Area (WPA)
Wells produce ~50% of drinking water around the world
Drill wells into groundwater
Create a cone of depression of water flow from nearby subsurface into the well collection area
Authorities establish a Wellhead Protection Area where activities are restricted to protect water quality
Requires a way to identify well’s drainage area
Must represent subsurface geology
Create boreholes that identify the layer structure at each depth
Interpolate structure between boreholes
Leverage models of distribution of subsurface properties
Invert to infer unobserved subsurface structure given observed boreholes
Inferred state is uncertain and not unique
Models often fail to predict the range of uncertainty
Additional datasets
Electrical resistivity of subsurface (affected by water content and quality)
Travel time of tracer materials from different surface locations to sensors in wells
Uncertainty quantification
We are uncertain about unobserved parameters of the subsurface
How does this uncertainty affect error/noise in model predictions
Markov Chain Monte Carlo
Sampling technique to establish this uncertainty connection
But requires many model runs
Doesn’t scale to many parameters
Requires a tight handle on the space of unknown parameters and their constraints
Alternative: Bayesian Simulation-Based Learning (BaSiL)
Goal: train a direct relationship between a predictor and a target
Bayesian and Based on Simulations
Approach:
Use subsurface models to predict both observable quantities (tracer experiments) and unobservable outcomes (WPA areas)
Run with many different subsurface structured + random variations
Simulate forward to predict observable outcomes
Tracing experiments
WPA
Train a statistical model to relate observables and unobserved outcomes
Then reduce dimensionality via PCA
Apply Convergent Correlation Analysis (CCA) to establish non-linear pair-wise correlations
Outcome: distribution that relates observable measurement and unobservable predictions; Can use this to create a probability distribution for the WPA given tracer experiments
Challenge: Depends on an accurate distribution of subsurface parameters (geologic layers)
Experimental Design
Goal: identify the experiments required to minimize uncertainty about unobservable quantities (WPA)
Given a set of possible experiments:
For each one we can use above model to predict how collecting this data to reduce uncertainty
Choose the most informative experiment
Can extend to various combinations of experiments to find the synergistic ones
Challenge: hard to apply to sequential inference tasks because model needs to be retuned when new data is introduced
Geophysical Inversion
Goal: Solving for unknown model parameters (what’s underground?)
Typically done via various model inversion algorithms
Here replaced with statistical learning
Example: Surface Nuclear Magnetic Resonance
Sensitive to water content of subsurface
12k dimensions in data space but very correlated
Can be reduced to 5 dimensions via PCA (keep 90% of variability)
Like above use CCA to correlate unknown model parameters to 5-dim measurements
UQ analysis results are close to MCMC but not perfect
Primary cause is that the inferred CCA model is not a perfect match for the dynamics of the original geological model on which it was trained
Sequential Optimization
Iterative addition of experiments that reduce the uncertainty in unobserved parameters
Optimum sequence of experiments is different for each situation, depends on the outcome of each experiment