##
** 8:40- 9:00 - Talk****: **FastFood: Approximating Kernel Expansion in Loglinear Time (Alex Smola, Research at Google)** **

**Talk Abstract: **

The ability to evaluate nonlinear function
classes rapidly is crucial for nonparametric
estimation. We propose an improvement to
random kitchen sinks that offers O(n log d)
computation and O(n) storage for n basis
functions in d dimensions without sacrificing
accuracy. We show how one may adjust the
regularization properties of the kernel simply
by changing the spectral distribution of the
projection matrix. Experiments show that
we achieve identical accuracy to full kernel
expansions and random kitchen sinks 100x
faster with 1000x less memory.

** 8:40- 9:00 - Coffee Break**

** 9:00- 9:30 - ****Invited Talk****: **Small-Variance Asymptotics, Nonparametric Bayes, and Kernel k-means (Brian Kulis, Ohio State University)

**Talk Abstract:**

**It is well known that mixture-of-Gaussians and k-means are related through asymptotics on the variance of the clusters---as the variance tends to zero, the EM algorithm becomes the k-means algorithm, and the complete-data log likelihood becomes the k-means objective function. As shown recently, such asymptotics can also be applied to Bayesian nonparametric models, leading to simple and scalable k-means-like algorithms for a host of problems including clustering, latent feature models, topic models, and others. In this talk, I will overview these results, with a focus on the connections to kernel methods. In particular, I will discuss how an existing equivalence between kernel k-means and graph clustering can be used in conjunction with the asymptotics of Bayesian nonparametric models to obtain a class of novel and scalable kernel-based algorithms for problems such as overlapping graph clustering and graph clustering when the number of clusters is not fixed.**** **

**9:30-10:00 - ****Invited Talk:**** **Kernel methods in nonparametric Bayesian models (Lawrence Carin, Duke University)

For handling large-scale problems, methods like Gaussian processes can be computationally challenging. In this paper, we discuss how use of alternative kernel methods can be employed to accelerate computations, without loss of modeling power. We examine this in the context of general nonparametric Bayesian models, with specific applications within the Beta process. The theoretical and algorithmic issues are discussed, with demonstration via several examples.

## 10:00-10:15 - Contributed Talk: Kernel Embeddings of Dirichlet Process Mixtures (Krikamol Muandet, Max Plank Institute of Biological Cybernetics)

**
10:15 - 16:00 Break**

**
****16:00-16:10 - ****Contributed Talk:**** **Kernel Methods for Learning Motion Patterns (Lachlan McCalman, University of Sydney)

16:10-16:20 - Contributed Talk: Kernels for Protein Structure Prediction (Narges Razavian, Carnegie Mellon University)

16:20-16:30 - Short Coffee Break

16:30-17:00 - Invited Talk: Kernel Topic Models (Thore Graepel, Microsoft Research Cambridge)

Talk Abstract:

Latent
Dirichlet Allocation models discrete data as a mixture of discrete
distributions, using Dirichlet beliefs over the mixture weights. We
study a variation
of this concept, in which the documents' mixture weight beliefs are
replaced with squashed Gaussian distributions. This allows documents to
be associated with elements of a Hilbert space, admitting kernel topic
models (KTM), modelling temporal, spatial, hierarchical,
social and other structure between documents. The main challenge is
effiicient approximate inference on the latent Gaussian. We present an
approximate algorithm cast around a Laplace approximation in a
transformed basis. The KTM can also be interpreted as
a type of Gaussian process latent variable model, or as a topic model
conditional on document features, uncovering links between earlier work
in these areas. This is joint work with Philipp Hennig (first author),
David Stern, and Ralf Herbrich.

17:00-17:30 - Invited Talk: Nonparametric Variational Inference (Matt Hoffman, Adobe)

Talk Abstract:

Variational methods are widely used for approximate posterior inference. However, their use is typically limited to families of distributions that enjoy particular conjugacy properties. To circumvent this limitation, we propose a family of variational approximations inspired by nonparametric kernel density estimation. The locations of these kernels and their bandwidth are treated as variational parameters and optimized to improve an approximate lower bound on the marginal likelihood of the data. Unlike most other variational approximations, using multiple kernels allows the approximation to capture multiple modes of the posterior. We demonstrate the e.cacy of the nonparametric approximation with a hierarchical logistic regression model and a nonlinear matrix factorization model. We obtain predictive performance as good as or better than more specialized variational methods and MCMC approximations. The method is easy to apply to graphical models for which standard variational methods are difficult to derive.

17:30-18:00 - Coffee Break

18:00-18:30 - Invited Talk: Determinantal Point Processes (Ben Taskar, University of Pennsylvannia)

Talk Abstract:

Determinantal
point processes (DPPs) arise in random matrix theory and
quantum physics as models of random variables with negative
correlations. Among many remarkable properties, they offer
tractable algorithms for exact inference, including computing
marginals, computing certain conditional probabilities, and
sampling. DPPs are a natural model for subset selection
problems where diversity is preferred. For example, they can
be used to select diverse sets of sentences to form document
summaries, or to return relevant but varied text and image
search results, or to detect non-overlapping multiple object
trajectories in video. In our recent work, we discovered a
novel factorization and dual representation of DPPs that
enables efficient inference for exponentially-sized structured
sets. We developed a new inference algorithm based on Newton
identities for DPPs conditioned on subset size. We also
derived efficient parameter estimation for DPPs from several
types of observations. We demonstrated the advantages of the
model on several natural language and vision tasks: extractive
document summarization, diversifying image search results and
multi-person articulated pose estimation problems in images.

18:30-19:00 - Invited Talk: Bayesian Interpretations and Extensions of Kernel Mean Embedding Methods (David Duvenaud, Cambridge University)

Talk Abstract:

We give a simple interpretation of mean embeddings as expectations under a Gaussian process prior. Methods such as kernel two-sample tests, the Hilbert-Schmidt Independence Criterion, and kernel herding are all based on distances between mean embeddings, also known as the Maximum Mean Discrepancy (MMD). This Bayesian interpretation allows a derivation of optimal herding weights, principled methods of kernel learning, and sheds light on the assumptions necessary for MMD-based methods to work in practice. In the other direction, the MMD interpretation gives tight, closed-form bounds on the error of Bayesian
estimators.

19:00-19:30 - Open Discussion on Current Challenges and Future Directions