Practical Bayesian Nonparametrics
workshop at NIPS 2016
Friday, December 9, 2016
Directions to the workshop
The workshop is located in the AC Hotel adjacent to the main conference center (CCIB). Walk out of the CCIB doors and turn left twice to get to the hotel. Our workshop is on the second floor in the "Barcelona Room".
Workshop Day Schedule
Abstracts for talks are available at the bottom of this page. Just scroll down.
Foundations lecture: "Bayesian nonparametrics: Why and when"
Bayesian nonparametric methods make use of infinite-dimensional mathematical structures to allow the practitioner to learn more from their data as the size of their data set grows. What does that mean, and how does it work in practice? In this foundations lecture, we'll focus on examples in clustering and network modeling to understand why machine learning and statistics need more than just parametric Bayesian inference.
A flexible, interpretable framework for assessing sensitivity to unmeasured confounding
When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates a Bayesian nonparametric fitting algorithm into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language.
A Markovian Model for Nonstationary Time Series via Bayesian nonparametrics
Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture non-standard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This results in a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, non-stationary Markovian model for real-valued data indexed in discrete time. To obtain a computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest that the model is able to recover challenging transition densities and non-linear dynamic relationships. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions and open questions about accommodating higher order structure and developing state-space models are also discussed.
Categorical Data Fusion using Auxiliary Information
In data fusion, analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher Harper-Collins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people’s preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion. This is joint work with Maria DeYoreo and Jerry Reiter.
Application of Gaussian Processes to Mechanics, Physics and Biology
Gaussian processes are flexible probabilistic models that allow us to infer a distribution over functions from training data. Therefore, Gaussian processes can be useful in applications where predictive uncertainty is critical and/or where the flexibility of the model needs to adjust to the size of the data. In this talk, I will discuss a variety of applications of Gaussian processes: Model-based reinforcement learning using probabilistic Gaussian processes achieves an unprecedented learning speed (data efficiency); Learning approximate simulators of LHC experiments can speed simulator experimentation by a factor 10,000 compared to Monte Carlo; Bayesian optimization is an efficient way for solving a global black-box optimization problem, e.g., for learning robot controllers or hyper-parameters of simulators for biological processes, where the amount of required experimentation is small. All these applications require solving some Gaussian-process specific challenges, which I will discuss briefly.