Schedule & Speaker Abstracts

Workshop schedule


7:55 - 8:00: Introductions

8:00 - 8:30: Chris Holmes. "How to train your model when it's wrong: Bayesian nonparametric learning in M-open"


8:30 - 8:35 : Q&A with Chris Holmes [live]

8:35 - 9:05: Ilse Ipsen. "BayesCG: A probabilistic numerical linear solver"

Abstract: We present the probabilistic numeric solver BayesCG, for solving linear systems with real symmetric positive definite coefficient matrices. BayesCG is an uncertainty`aware extension of the conjugate gradient (CG) method that performs solution-based inference with Gaussian distributions to capture the uncertainty in the solution due to early termination.

Under a structure exploiting `Krylov' prior, BayesCG produces the same iterates as CG. The Krylov posterior covariances have low rank, and are maintained in factored form to preserve symmetry and positive semi-definiteness. This allows efficient generation of accurate samples to probe uncertainty in subsequent computation.


Bio: Ilse C.F. Ipsen received a BS from the University of Kaiserslautern in Germany and a Ph.D. from Penn State, both in Computer Science. She is a Professor of Mathematics at NCState, with affiliate appointments in Statistics and the Institute for Advanced Analytics. Her research interests include numerical linear algebra, randomized algorithms, and probabilistic numerics. She is a Fellow of the AAAS and SIAM.

9:05 - 9:10 : Q&A with Ilse Ipsen [live]

9:10 - 9:45: Discussions in Gathertown [live]

Join us after the talks in Gathertown for a more casual discussion with the speakers and other participants of the workshop!

9:45-10:00: Michail Spitieris. "Bayesian calibration of imperfect computering models using physics-informed priors"

Abstract: In this work, we introduce a computational efficient data-driven framework suitable for quantifying the uncertainty in physical parameters of computer models, represented by differential equations. We construct physics-informed priors for differential equations, which are multi-output Gaussian process (GP) priors that encode the model's structure in the covariance function. We extend this into a fully Bayesian framework which allows quantifying the uncertainty of physical parameters and model predictions. Since physical models are usually imperfect descriptions of the real process, we allow the model to deviate from the observed data by considering a discrepancy function. For inference Hamiltonian Monte Carlo (HMC) sampling is used.

This work is motivated by the need for interpretable parameters for the hemodynamics of the heart for personal treatment of hypertension. The model used is the arterial Windkessel model, which represents the hemodynamics of the heart through differential equations with physically interpretable parameters of medical interest. As most physical models, the Windkessel model is an imperfect description of the real process. To demonstrate our approach we simulate noisy data from a more complex physical model with known mathematical connections to our modeling choice. We show that without accounting for discrepancy, the posterior of the physical parameters deviates from the true value while when accounting for discrepancy gives reasonable quantification of physical parameters uncertainty and reduces the uncertainty in subsequent model predictions.

10:00 - 10:15: Masha Naslidynk. "Invariant priors for Bayesian quadrature"

Abstract: Bayesian quadrature (BQ) is a model-based numerical integration method that is able to increase sample efficiency by encoding and leveraging known structure of the integration task at hand. In this paper, we explore priors that encode invariance of the integrand under a set of bijective transformations in the input domain, in particular some unitary transformations, such as rotations, axis-flips, or point symmetries. We show initial results on superior performance in comparison to standard Bayesian quadrature on several synthetic and one real world application.

10:15 - 11:30 Poster session I [live]


11:30 - 12:35 Panel discussion [live]

Panelist Bios:

David Dunson

Maria Kwiatkowska

Steve MacEachern

Jeff Miller

Briana Stephenson

12:35 - 1:30 Discussions in Gathertown + break [live]


1:30 - 1:45 Maria Cervera. "Uncertainty estimation in model misspecification in neural network regression"

Abstract: Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.

1:45 - 2:00 Jackson Killian. "Your bandit model is not perfect: Introducing robustness to restless bandits enabled by deep reinforcement learning"

Abstract: Restless multi-arm bandits (RMABs) are receiving renewed attention for their potential to model real-world planning problems under resource constraints. However, few RMAB models have surpassed theoretical interest, since they make the limiting assumption that model parameters are perfectly known. In the real world, model parameters often must be estimated via historical data or expert input, introducing uncertainty. In this light, we introduce a new paradigm, \emph{Robust RMABs}, a challenging generalization of RMABs that incorporates interval uncertainty over parameters of the dynamic model of each arm. This uncovers several new challenges for RMABs and inspires new algorithmic techniques of general interest. Our contributions are:

(i) We introduce the Robust Restless Bandit problem with interval uncertainty and solve a minimax regret objective;


(ii) We tackle the complexity of the robust objective via a double oracle (DO) approach and analyze its convergence;


(iii) To enable our DO approach, we introduce RMABPPO, a novel deep reinforcement learning (RL) algorithm for solving RMABs, of potential general interest.;


(iv) We design the first adversary algorithm for RMABs, required to implement the notoriously difficult minimax regret adversary oracle and also of general interest, by formulating it as a multi-agent RL problem and solving with a multi-agent extension of RMABPPO.

2:00 - 2:30 Andrés Masegosa. "Bayesian model averaging in not model combination: A PAC-Bayesian of deep ensembles"

Abstract: Almost twenty years ago, Thomas Minka nicely illustrated that Bayesian model averaging (BMA) is different from model combination. Model combination works by enriching the model space, because it considers all possible linear combinations of all the models in the model class, while BMA represents the inability for knowing which is the best single model when using a limited amount of data. However, twenty years later, this distinction becomes not so clear in the context of ensembles of deep neural networks: are deep ensembles performing a crude approximation of a highly multi-modal Bayesian posterior? Or, are they exploiting an enriched model space and, in consequence, they should be interpreted in terms of model combination? In this talk, we will introduce recently published theoretical analyses that will shed some light on these questions. As you will see in this talk, whether your model is wrong or not plays a crucial role in the answers to these questions.

Bio: Andres R. Masegosa is an associate professor at the Department of Computer Science at Aalborg University (Copenhagen Campus-Denmark). Previously, he was an assistant professor at the University of Almería (Spain). He got his PhD in Computer Science at the University of Granada in 2009. He is broadly interested in modelling intelligent agents that learn from experience using a probabilistic approach. He has published more than sixty papers in international journals and conferences in the field of machine learning.

2:30 - 2:35 Q&A with Andrés Masegosa [live]


2:35 - 3:00 Session 3: Discussion in Gathertown [live]


3:00 - 3:15 Alex Alemi. "PAC^m-Bayes: Narrowing the empirical risk gap in the misspecified Bayesian regime"

Abstract: The Bayesian posterior minimizes the "inferential risk" which itself bounds the "predictive risk." This bound is tight when the likelihood and prior are well-specified. However since misspecification induces a gap, the Bayesian posterior predictive distribution may have poor generalization performance. This work develops a multi-sample loss (PAC^m) which can close the gap by spanning a trade-off between the two risks. The loss is computationally favorable and offers PAC generalization guarantees. Empirical study demonstrates improvement to the predictive distribution.

3:15 - 3:30 Eli Weinstein. "Bayesian Data Selection"

Abstract: Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic - such as a subset of variables - that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the "Stein volume criterion (SVC)", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback--Leibler divergence. We prove that the SVC is consistent for data selection. We apply the SVC to the analysis of single-cell RNA sequencing datasets using a spin glass model of gene regulation.

3:30 - 4:00 Jonathan Huggins. "Statistically robust inference with stochastic gradient algorithms"

Abstract: Stochastic gradient algorithms are widely used for large-scale learning and inference problems. However, their use in practice is typically guided by heuristics and trial-and-error rather than rigorous, generalizable theory. We take a step toward better understanding the effect of the tuning parameters of these algorithms by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size, including stochastic gradient descent with and without additional Gaussian noise, momentum, and/or acceleration. We show that near a local optimum, the iterates converge weakly to paths of an Ornstein–Uhlenbeck process, and provide sufficient conditions for the stationary distributions of the finite-sample processes to converge weakly to that of the limiting process. In particular, with appropriate choices of tuning parameters, the limiting stationary covariance can match either the Bernstein–von Mises-limit of the posterior, adjustments to the posterior for model misspecification, or the asymptotic distribution of the maximum likelihood estimate – and that with a naive tuning, the limit corresponds to none of these. Moreover, we argue that, in the large-sample regime, an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. Our results show that properly tuned stochastic gradient algorithms offer a practical approach to obtaining inferences that are computationally efficient and statistically robust.


Bio: Jonathan Huggins is an Assistant Professor in the Department of Mathematics & Statistics, a Data Science Faculty Fellow, and a Founding Member of the Faculty of Computing & Data Sciences at Boston University. Prior to joining BU, he was a Postdoctoral Research Fellow in the Department of Biostatistics at Harvard. He completed his Ph.D. in Computer Science at the Massachusetts Institute of Technology in 2018. Previously, he received a B.A. in Mathematics from Columbia University and an S.M. in Computer Science from the Massachusetts Institute of Technology. His research centers on the development of fast, trustworthy machine learning and AI methods that balance the need for computational efficiency and the desire for statistical optimality with the inherent imperfections that come from real-world problems, large datasets, and complex models. His current applied work is focused on methods to enable more effective scientific discovery from high-throughput and multi-modal genomic data.

4:00 - 4:05 Q&A with Jonathan Huggins [live]


4:05 - 4:30 Discussions in Gathertown [live]


4:30 - 5:00 Lester Mackey. "Your model is wrong (but might still be useful)"

Abstract: To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, I'll describe how Stein's method -- a tool developed to prove central limit theorems -- can be adapted to assess and improve the quality of practical inference procedures. Along the way, I’ll highlight applications to Markov chain Monte Carlo sampler selection, goodness-of-fit testing, and black-box importance sampling.


Bio: Lester Mackey is a Principal Researcher at Microsoft Research, where he develops machine learning methods, models, and theory for large-scale learning tasks driven by applications from climate forecasting, healthcare, and the social good. Lester moved to Microsoft from Stanford University, where he was an assistant professor of Statistics and (by courtesy) of Computer Science. He earned his PhD in Computer Science and MA in Statistics from UC Berkeley and his BSE in Computer Science from Princeton University. He co-organized the second place team in the Netflix Prize competition for collaborative filtering, won the Prize4Life ALS disease progression prediction challenge, won prizes for temperature and precipitation forecasting in the yearlong real-time Subseasonal Climate Forecast Rodeo, and received best paper and best student paper awards from the ACM Conference on Programming Language Design and Implementation and the International Conference on Machine Learning.


5:00 - 5:05 Q&A with Lester Mackey [live]


5:05 - 5:35 Yixin Wang, "Statistical and Computational Tradeoffs in Variational Bayes"

Abstract: Variational inference has recently emerged as a popular alternative to Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. A core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, as opposed to targeting the exact posterior as in MCMC. Approximating the exact posterior by a restricted inferential model (a.k.a. variational approximating family) reduces computation costs but sacrifices its statistical accuracy. In this work, we develop a theoretical characterization of this statistical-computational tradeoff in variational inference. We focus on a case study of Bayesian linear regression using inferential models (a.k.a. variational approximating families) with different degrees of flexibility. From a computational perspective, we find that less flexible variational families speed up computation. They reduce the variance in stochastic optimization and in turn, accelerate convergence. From a statistical perspective, however, we find that less flexible families suffer in approximation quality, but provide better statistical generalization.


This is joint work with Kush Bhatia, Nikki Kuang, and Yi-an Ma.

Bio: Yixin Wang is an LSA Collegiate Fellow in Statistics at the University of Michigan. She works in the fields of Bayesian statistics, machine learning, and causal inference. Previously, she was a postdoctoral researcher with Professor Michael Jordan at the University of California, Berkeley. She completed her PhD in statistics at Columbia, advised by Professor David Blei, and her undergraduate studies in mathematics and computer science at the Hong Kong University of Science and Technology. Her research has received several awards, including the INFORMS data mining best paper award, Blackwell-Rosenbluth Award from the junior section of ISBA, student paper awards from ASA Biometrics Section and Bayesian Statistics Section, and the ICSA conference young researcher award.

5:35 - 5:40 Q&A with Yixin Wang [live]


5:40 - 6:15 Discussions in Gathertown [live]


6:15 - 7:30 Poster session II [live]