Anima Anandkumar, UC Irvine
Title: Guaranteed Learning of Overcomplete Latent Representations
Overcomplete representations have been extensively employed
for unsupervised feature learning and arguably critical in a number of
applications such as speech and computer vision. They provide
flexibility in modeling and robustness to noise. In an overcomplete
model, the latent dimensionality can far exceed the observed
dimensionality. Learning overcomplete models is ill-posed unless
further constraints are imposed. I will provide an overview of our
recent results on identifiability and learning of overcomplete models
under two popular frameworks, viz., topic modeling and sparse
Probabilistic topic models are hierarchical mixture models which
incorporate multiple latent topics in each document with observed
words. We establish identifiability and learning given higher order
observed moments (e.g. fourth order) under structured sparsity of
topic-word matrix and local persistence of topics in documents. The
presence of local persistence of topics in documents violates the
exchangeability assumption of the popular bag-of-words model, and our
analysis establishes that incorporating such sequence information in
documents is crucial for identifiability of overcomplete models.
Under topic persistence, we establish that for topic-word matrices
with random sparsity, the number of topics can be polynomially higher
compared to the word vocabulary, and yet be identifiable from observed
moments. More generally, our results imply uniqueness of tensor Tucker
decomposition under structured sparsity.
We consider another framework for overcomplete models based on sparse
coding. A sparse code encodes each observed sample using a sparse
combination of dictionary elements and the task is to recover both the
dictionary as well as the mixing coefficients. We propose a
"clustering style" method for initial estimation of the dictionary and
successive refinement using alternative minimization involving the
dictionary and the coefficients. We prove that the method recovers the
underlying overcomplete dictionary and coefficients when the
dictionary elements are mutually incoherent. Our results thus
establish that overcomplete latent variable models can be learnt
efficiently under structured sparsity constraints.
A. Anandkumar, D. Hsu, M. Janzamin and S. Kakade. “When are
Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker
Decompositions with Structured Sparsity" NIPS 2013.
A. Agarwal, A. Anandkumar, P. Netrapalli. “Exact Recovery of
Sparsely Used Overcomplete Dictionaries” Preprint, September 2013.
A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, R. Tandon.
“Learning Sparsely Used Overcomplete Dictionaries via Alternating
Minimization" Preprint, Oct. 2013.
Alex Smola, Google/CMU
Title: Hierarchy and Structure: Nonparametric models for space, language, and relations
Latent variable models are a powerful tool for analyzing structured data. They are well suited to capture documents, location and preference information. That said, often a simple hierarchical model is insufficient for modeling observations since real data tends to be more nuanced in some aspects rather than others. In other words, descriptions work best if they allow for variable depth and refined descriptions. Models such as the nested Chinese Restaurant Franchise address these issues. I will present examples of their application to location inference for Twitter and structured recommender systems.
This is joint work with Amr Ahmed, Liangjie Hong, Yuchen Zhang, and Vanja Josifovski.
David Blei, Princeton
Title: Probabilistic Topic Models: Origins and Challenges
Probabilistic topic models uncover the hidden thematic structure in
large collections of documents. Topic models have been extended in
myriad ways and enable many applications that use text as data.
My goal in this talk is to set the stage for our day-long workshop.
First, I will discuss the origins of topic modeling and characterize
the map of modern topic modeling research. I will outline the thread
of work from latent semantic analysis to latent Dirichlet allocation
(LDA) and to its many cousins. I will try to give intuitions about
why LDA "works" and describe various perspectives on the model.
Second, I will describe some of the recent ideas that my research
group has been interested in: Poisson factorization and its
relationship to LDA, models of text and user-behavior data, and
stochastic variational inference for fitting models to massive data.
Finally, I will situate topic modeling in the bigger picture of modern
probabilistic modeling. I will discuss the main challenges and open
questions that face our field.