Practical Bayesian Nonparametrics

workshop at NIPS 2016

Friday, December 9, 2016

AC Hotel: Barcelona Room (2nd floor)

Barcelona, Spain


Directions to the workshop
The workshop is located in the AC Hotel adjacent to the main conference center (CCIB). Walk out of the CCIB doors and turn left twice to get to the hotel. Our workshop is on the second floor in the "Barcelona Room".

Workshop Day Schedule
 Abstracts for talks are available at the bottom of this page. Just scroll down.

 8:15-8:30 Welcome and Introductions
 8:30-9:00 Foundations Talk
 Tamara Broderick, MIT
 9:00-9:30 Invited Talk
 A flexible, interpretable framework for assessing sensitivity to unmeasured confounding
 Jennifer Hill, New York University
 9:30-9:45 Contributed Talk
 Scaling up the Automatic Statistician: Scalable Structure Discovery in Regression using Gaussian Processes
 Hyunjik KimPh.D. student, University of Oxford
 9:45-10:00 Contributed Talk
 Sparse Three-parameter Restricted Indian Buffet Process for Understanding International Trade
 Melanie F. Pradier Ph.D. student, University Carlos III
 10:00-10:30 Coffee Break
 Rob will be on hand during the first coffee break to help anybody install Stan.
 10:30-11:00 Invited Talk:  Categorical Data Fusion using Auxiliary Information
 Bailey Fosdick, Colorado State University
 11:00-11:15 Poster Spotlights 
        Public slidedeck of all spotlight slides
 11:15-12:15 Poster Session
 12:15-12:45 Lunch
 12:45-13:45 Lunch Session Software Tutorial #1
 13:45-14:45 Lunch Session Software Tutorial #2
 14:45-15:45 Coffee Break
 15:45-16:15 Invited Talk Marc Deisenroth, Imperial College London
 16:15-16:30 Contributed Talk
 Analyzing Learned Convnet Features with Dirichlet Process Gaussian Mixture Models
 David Malmgren-Hansen, Ph.D. student, Technical University of Denmark
16:30 -17:00 Software Panel
  • Martin Trapp, Austrian Research Institute for Artificial Intelligence

    • Lead developer of BNP.jl (Julia implementation of BNP methods)

  • Dustin Tran, Columbia University

    • Lead developer of Edward, which fuses Bayesian methods (esp Gaussian processes), deep learning, and probabilistic programming

    • Contributor to Stan

  • Aki Vehtari, Aalto University
    • Stan contributor
    • Lead developer of GPstuff
  • Mike Hughes, Harvard University

  • Lead developer of BNPy

17:00 -17:30 Invited Talk
 A Markovian Model for Nonstationary Time Series via Bayesian nonparametrics
 Maria DeYoreo, Duke University
 17:30-18:30  Invited Panel: 
  • Bailey Fosdick, Colorado State University

  • Maria DeYoreo, Duke University

  • Suchi Saria, Johns Hopkins University

  • Jim Griffin, University of Kent

  • Marc Deisenroth, Imperial College London


Invited Talk Abstracts


Foundations lecture: "Bayesian nonparametrics: Why and when"
Tamara Broderick

Bayesian nonparametric methods make use of infinite-dimensional mathematical structures to allow the practitioner to learn more from their data as the size of their data set grows. What does that mean, and how does it work in practice? In this foundations lecture, we'll focus on examples in clustering and network modeling to understand why machine learning and statistics need more than just parametric Bayesian inference.
A flexible, interpretable framework for assessing sensitivity to unmeasured confounding
Jennifer Hill

When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis.  In particular, our approach incorporates a Bayesian nonparametric fitting algorithm into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language.


 A Markovian Model for Nonstationary Time Series via Bayesian nonparametrics
 Maria DeYoreo

Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture non-standard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This results in a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, non-stationary Markovian model for real-valued data indexed in discrete time. To obtain a computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest that the model is able to recover challenging transition densities and non-linear dynamic relationships. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions and open questions about accommodating higher order structure and developing state-space models are also discussed.

 Categorical Data Fusion using Auxiliary Information
 Bailey Fosdick

In data fusion, analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher Harper-Collins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people’s preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion. This is joint work with Maria DeYoreo and Jerry Reiter.

Application of Gaussian Processes to Mechanics, Physics and Biology
Marc Deisenroth

Gaussian processes are flexible probabilistic models that allow us to infer a distribution over functions from training data. Therefore, Gaussian processes can be useful in applications where predictive uncertainty is critical and/or where the flexibility of the model needs to adjust to the size of the data. In this talk, I will discuss a variety of applications of Gaussian processes: Model-based reinforcement learning using probabilistic Gaussian processes achieves an unprecedented learning speed (data efficiency); Learning approximate simulators of LHC experiments can speed simulator experimentation by a factor 10,000 compared to Monte Carlo; Bayesian optimization is an efficient way for solving a global black-box optimization problem, e.g., for learning robot controllers or hyper-parameters of simulators for biological processes, where the amount of required experimentation is small. All these applications require solving some Gaussian-process specific challenges, which I will discuss briefly.