Minisymposium-42 Uncertainty Quantification for Nonparametric Inverse Problems, Tuesday 09 July 2019 at 16:30 Room 1 and Friday 12 July 2019 at 08:00 Room 12
Minisymposium-54 Accelerating sampling strategies for large-scale Bayesian inverse problems, Tuesday 09 July 2019 at 14:00 Room 13 and Tuesday 09 July 2019 at 16:30 Room 13
Talks by Botond Szabo, Jean-Bernard Salomond, Julyan Arbel, Olivier Zahm, and others.
14/00, Salle 106 - Batiment IMAG.
Uncertainty estimation in seismic tomography with ensemble Data Assimilation
Full Waveform Inversion (FWI) seek to estimate subsurface properties, based on ill-posed and computationally challenging inverse problem-solving. This methodology involves minimizing a data-misfit between synthetic wavefield computed from a prior subsurface estimate, and sparse indirect recorded waveform data, generally located at the surface. Although it is possible to come down to a reasonable data-fit, obtaining high-resolution subsurface models, the intrinsic properties of FWI make the process complicated: based on quasi-Newton local optimization schemes, making any claims on the validity of a unique solution is an unsound exercise. Therefore, it is crucial that FWI depart from the deterministic-mono solution frame, and move toward a more statistical approach, integrating uncertainty quantification at the heart of its processes. To that extent, we propose a new methodological development to recast our problem in the Bayesian inference framework, by borrowing and applying ensemble methods coming from the Data Assimilation (DA) community. We investigated uses of such methodologies on synthetic and real data tests, by applying a combination of quasi-newton FWI optimization scheme, and well known, Ensemble Transform Kalman Filter from the DA community. On top of proposing an ETKF-FWI scheme to estimate uncertainty, we also study the importance of prior knowledge and the challenges of ensemble representativity for high-dimensional state estimate problem such as FWI.
14/00, Salle 106 - Batiment IMAG.
Towards rigorous Variational Bayesian Inference (arxiv, slides)
Sampling methods are often considered to be the gold-standard for Bayesian inference because of the attractive promise of their error tending to zero in the limit of infinite computational resources. A much cheaper alternative is provided by modern Variational Inference methods (Blei et al, 2017) or even the Laplace approximation. Empirical tests reveal that such approximations of the posterior are often very accurate but one key limitation is that it is computationally infeasible to give computable guarantees on the approximation quality.
I will present a result aimed at solving this conundrum. I will show that, in the large data limit, the Kullback-Leibler divergence between a probability distribution f(θ) and its Laplace approximation g(θ) can be accurately approximated as:
KL( g(θ), f(θ) ) = E_g [ log g(θ) - log f(θ) ] ≈ 0.5 Var_g [ log g(θ) - log f(θ) ]
Critically, this approximation does not require knowledge of the normalization constant of f(θ) and is straightforward to estimate by sampling from g(θ). This result enables us to accurately measure the size of the error of the Laplace approximation and is critical to ensuring that Variational Inference is not only cheap but rigorous.
14/00, F107 - Inria.
Stochastic sampling machine for Bayesian inference
Compared to conventional processors, stochastic computing architectures have strong potential to speed up computation time and to reduce power consumption. In this talk, such an architecture, called Bayesian Machine (BM), dedicated to solving Bayesian inference problems will be presented. The BM uses stochastic computing and Bayesian models to compute the inference of a given probabilistic model. It calculates in parallel the inference over a searched variable. The machine will be explained using the example of SSL – Sound Source Localization. Different optimizations made on the BM to speed up the computation time and hence reduce the power consumption will be given.
14:00, Salle 106 - Batiment IMAG.
Nonlinear filtering and smoothing with transport maps
We consider the Bayesian filtering problem for high dimensional non-Gaussian state-space models with challenging nonlinear dynamics, and sparse observations in space and time. While the ensemble Kalman filter (EnKF) yields robust ensemble approximations of the filtering distribution, it is limited by linear forecast-to-analysis transformations. To generalize the EnKF, we propose a methodology that transforms the non-Gaussian forecast ensemble at each assimilation step into samples from the current filtering distribution via a sequence of local nonlinear couplings. These couplings are based on transport maps that can be computed quickly using convex optimization, and that can be enriched in complexity to reduce the intrinsic bias of the EnKF. We discuss the low-dimensional structure inherited by the transport maps from the filtering problem, including decay of correlations, conditional independence, and local likelihoods. We then exploit this structure to regularize the estimation of the maps in high dimensions and with a limited ensemble size.
We also present variational methods---again based on transport maps---for filtering, smoothing, and sequential parameter estimation in non-Gaussian state-space models. These methods rely on results linking the Markov properties of a target measure to the existence of low-dimensional couplings, induced by transport maps that are decomposable. The resulting algorithms can be understood as a generalization, to the non-Gaussian case, of the square-root Rauch--Tung--Striebel Gaussian smoother.
This is joint work with Ricardo Baptista, Daniele Bigoni, and Alessio Spantini.
11h, F107, Inria Grenoble Rhône-Alpes
On the Pitman-Yor process with spike and slab base measure (pdf)
For the most popular discrete nonparametric models, beyond the Dirichlet process, the prior guess at the shape of the data-generating distribution, also known as the base measure, is assumed to be diffuse. Such a specification greatly simplifies the derivation of analytical results, allowing for a straightforward implementation of Bayesian nonparametric inferential procedures. However, in several applied problems the available prior information leads naturally to the incorporation of an atom into the base measure, and then the Dirichlet process is essentially the only tractable choice for the prior. In this paper we fill this gap by considering the Pitman–Yor process with an atom in its base measure. We derive computable expressions for the distribution of the induced random partitions and for the predictive distributions. These findings allow us to devise an effective generalized Pólya urn Gibbs sampler. Applications to density estimation, clustering and curve estimation, with both simulated and real data, serve as an illustration of our results and allow comparisons with existing methodology. In particular, we tackle a functional data analysis problem concerning basal body temperature curves.
14h00, Salle 106 - Batiment IMAG. Joint with DATA Seminar
Méthode bayésienne pour estimer le seuil optimal d'un marqueur utilisé pour choisir le traitement des patients
Abstract: L'utilisation d'un marqueur compagnon quantitatif (aussi appelé facteur prédictif) pour choisir entre deux options de traitement nécessite l'estimation d'un seuil optimal au-delà duquel un des deux traitements est préféré. Dans cette présentation, l'expression du seuil optimal s'appuie sur la définition d'une fonction d'utilité ayant pour but de quantifier l'utilité moyenne de la population étudiée (ex: espérance de vie, qualité de vie, ...) en prenant en compte à la fois l'efficacité (succès ou échec) et la toxicité de chaque option de traitement. Ainsi, le seuil optimal est la valeur de marqueur qui maximise l'utilité moyenne de la population. Une méthode modélisant la distribution du marqueur dans des sous-groupes de patients définis par le traitement reçu et le critère de jugement est proposée pour calculer les paramètres de la fonction d'utilité afin d'estimer le seuil optimal, ainsi que son intervalle de crédibilité, en utilisant l'inférence Bayésienne. L'étude de simulation a démontré le faible biais, ainsi qu'une probabilité de couverture proche de 95% de la méthode dans de nombreux scénarios, mais également le besoin de grands échantillons pour obtenir une estimation précise du seuil. La méthode a ensuite été appliquée aux données de l'essai PETACC-8 qui compare l'efficacité d'une chimiothérapie avec une combinaison de chimiothérapie + un anti-EGFR chez des patients atteints d'un cancer colorectal de phase III.
14h00, Salle 106 - Batiment IMAG. Joint with DATA Seminar
Noisy Hamiltonian Monte Carlo for doubly intractable distributions
Abstract: Les méthodes de Monte Carlo hybrides (HMC) sont apparues au cours des dernières années comme une alternative efficace aux algorithmes de Metropolis Hastings lorsque ceux-ci échouent à explorer efficacement la loi cible. Ces méthodes consistent à générer une chaîne de Markov sur un espace augmenté et dont les transitions reposent sur un système d'équations différentielles issu de la mécanique hamiltonienne. En pratique, ce système n'admet pas de solution explicite et une étape de discrétisation via un intégrateur symplectique est nécessaire. Cette discrétisation du flot hamiltonien conduit à une solution approchée qui ne préserve plus la loi cible comme mesure invariante, et une étape d'acceptation-rejet est donc utilisée au sein de l'algorithme pour corriger ce biais. Dans le contexte de distributions dites doublement insolubles -- comme par exemple les champs de Markov (e.g., modèle de Potts, modèles de graphes aléatoires ERGM), HMC rencontre deux difficultés computationnelles majeures : le calcul du gradient intervenant dans le flot hamiltonien et l'étape d'acceptation-rejet. Dans cette présentation, je décrirai le comportement de l'algorithme HMC dans ce contexte particulier lorsque les différentes étapes sont approchées par des estimateurs Monte Carlo et discuterai brièvement des questions de calibrations. J'illustrerai cela sur deux exemples : le modèle de Potts et un modèle de type ERGM.
At 14:00 in room F107, Inria Montbonnot Saint-Martin
Optimization-based Sampling Approaches for Hierarchical Bayesian Inference
Abstract: Markov chain Monte Carlo (MCMC) relies on efficient proposals to sample from a target distribution of interest. Recent optimization-based MCMC algorithms for Bayesian inference, e.g., randomize-then-optimize (RTO), repeatedly solve optimization problems to obtain proposal samples. We interpret RTO as an invertible map between two random functions and find that this mapping preserves the random functions along many directions. This leads to a dimension independent formulation of the RTO algorithm for sampling the posterior of large-scale Bayesian inverse problems. We applied our new methods on Hierarchical Bayesian inverse problems.
At 14:00, salle 106, IMAG building.
Parameter Estimation Through Structure-Preserving Approximate Bayesian Computation for Stochastic Neural Mass Models
Abstract: In this talk we perform statistical inference for a stochastic neural mass model using approximate Bayesian computation. Stochastic neural mass models are used to describe the electrical activity of a whole population of neurons with average properties, and have been reported to reproduce, for example EEG/MEG/SEEG data. Here we focus on a specific reformulation of the Jansen and Rit neural mass model [1] as a stochastic differential equation (JR-SDE) with additive noise [2]. We can analyse this new stochastic version of the model through its dynamical and structural properties. In particular, we are interested in estimating some parameters that have been shown to be relevant for the description of α-rhythmic and epileptic behaviour.
In [2], the authors declared that the JR-SDE can be re-formulated as a stochastic Hamiltonian equation, which enabled them to prove its ergodicity. This guarantees that the distribution of the 6-dimensional solution process X(t) = (X0(t), ..., X5(t))T , t ∈ [0, T] converges exponentially fast towards a unique invariant measure and allows us to extract important statistical properties from single sample paths.
Here we perform statistical inference for this stochastic model, making use of a numerical splitting scheme that has been shown to preserve the structural model properties, differently from commonly used schemes, such as the Euler Maruyama method [2]. From an experimental point of view, the solution process (X(t))t∈[0,T ] is partially observed through the EEG-related stochastic process Y (t) = X1(t) − X2(t), t ∈ [0, T]. Two main difficulties arise: First, due to the fact that the non-linear and multidimensional SDE cannot be explicitly solved, the dynamics of the signal process (Y (t))t∈[0,T ] can be only simulated through the numerical scheme.
Second, the corresponding underlying likelihood function is intractable.
We tackle this last issue by considering the likelihood-free and simulation based approximate Bayesian computation (ABC) approach [3]. This is a Bayesian technique that necessitates plenty of synthetic data simulations from the original model.
In the proposed statistical analysis, the crucial part is to define reliable distance criteria to successfully compare the simulated synthetic signals with the observed reference data. Due to the large variability in the data generated by the JR-SDE, neither the calculation of distances between the data itself nor the use of standard summary statistics work. Even more sophisticated and common distances for time series fail. To overcome this difficulty, we propose to transform the signal data from time to frequency domain by considering the corresponding spectral density. The spectral density depends on parameters that directly affect the frequency as well as the amplitude and, therefore, carries significant dynamical and structural information.
By this clever use of the parameter dependent structural and dynamical properties of the system, the ABC approach, with the adopted numerical splitting method, enables us to fit the model to real EEG data.
References
[1] B.H. Jansen and V.G. Rit. "Electroencephalogram and visual evoked potential generation in a mathematical model of coupled cortical columns." In: Biological cybernetics, 73(4):357-366 (1995)
[2] M. Ableidinger, E. Buckwar, and H. Hinterleitner. "A Stochastic Version of the Jansen and Rit Neural Mass Model: Analysis and Numerics." In: The Journal of Mathematical Neuroscience 7(8) (2017)
[3] M.A. Beaumont, W. Zhang, D.J. Balding "Approximate Bayesian computation in population genetics". Genetics, 162(4):2025-2035 (2002)
At 14:00 in room F107, Inria Montbonnot Saint-Martin
Minibatch and incremental learning of exponential family mixtures, and the soft k-means clustering problem.
Abstract: Mixtures of exponential family distributions are an important class of probabilistic models that form the basis of many model-based clustering approaches. The EM algorithm is typically used to learn the parameter of such models, from data. When data are large in size and dimensionality, the computational performance of the EM algorithm can be impeded by memory issues and computational bottlenecks.
Recently, there has been a trend towards the use of stochastic-approximation algorithms, in order to circumvent the bottlenecks of traditional algorithms. In this talk, we present a stochastic EM algorithm framework that can be used for minibatch and incremental learning of exponential family mixtures. The algorithm is provably convergent, and covers the important special case of Gaussian mixture models. We also demonstrate a modification of the algorithm that can be used to incrementally solve the soft k-means problem.