We thank all the speakers who agreed to come to this workshop and share their vision on this field with us.
Quick Jump: Pradeep Ravikumar, Po-Ling Loh, Le Song, Taiji Suzuki, Yoshinobu Kawahara, Wittawat Jitkrittum, Joe Suzuki, Shohei Shimizu, Takashi Takenouchi, Kei Hirose,Hiroshi Ishikawa, Muneki Yasuda, Kohei Hayashi, Masaki Saito, Song Liu
Department of Computer Science, University of Texas, Austin
Title: Poisson Graphical Models With Rich Dependence Structures (SLIDES)
Abstract: Undirected graphical models, such as Gaussian, Ising, and discrete/multinomial graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing proposals for distributions for multivariate count data have a crucial caveat: the dependence structures they model are largely restrictive, with solely negative or positive dependencies in some cases.
Can we devise multivariate distributions that can capture rich dependence structures between count-valued variables? We address this question via a series of multivariate extensions of the univariate Poisson distribution, providing a new class of Poisson graphical models. We also provide tractable schemes with guarantees for learning our class of Poisson graphical models from data, and demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-Sequencing data.
Joint work with Eunho Yang, Genevera Allen, Zhandong Liu, David Inouye, Inderjit Dhillon.
Department of Statistics at the Wharton School,
University of Pennsylvania
Title: "What can be learned with the graphical Lasso: Generalizations of the Gaussian assumption"
(SLIDES)
Abstract: We discuss a variety of results concerning structure estimation for graphical models. Our results are motivated by well-known theory for Gaussian graphical models; however, a key insight is that methods such as the graphical Lasso and nodewise linear regression may be statistically consistent for edge recovery even when data are non-Gaussian. In the first part of the talk, we outline theoretical results relating the support of generalized inverse covariance matrices to the edge structure of undirected graphs for discrete-valued variables, and also discuss the significance of the inverse covariance matrix in the case of linear structural equation models. In the second part of the talk, we turn our attention to settings where data are missing, contaminated, or systematically corrupted. We show how the graphical Lasso may be adjusted to yield statistically consistent estimators even when data possess some form of systematic corruption, and also discuss theoretical guarantees for a robust version of the graphical Lasso when data are subject to cellwise contamination.
Computational Science and Engineering, College of Computing,
Georgia Institute of Technology
Title:
Scalable Kernel Embedding of Latent Variable Models (SLIDES)
Abstract:
Kernel embedding of distributions maps distributions to the reproducing kernel Hilbert space (RKHS) of a kernel function, such that subsequent manipulations of distributions can be achieved via RKHS distances, linear and multilinear transformations, and spectral analysis. This framework has led to simple and effective nonparametric algorithms in various machine learning problems, such as feature selection, two-sample test, time-series modeling and belief propagation. In this talk, I will focus on kernel embedding of latent variable models where the components in the models can have nonparametric form. The presence of latent variables in a model induces a sophisticated low rank structure in its kernel embedding, and is exploited for designing kernel algorithms for learning the latent parameters. While the method can adapt to the increasing complexity of the data as their volume grow, it is not scalable to large datasets. I will also introduce an approach called doubly stochastic functional gradients to scale up the methods and present some empirical results.
Department of Mathematical and Computing Sciences
Tokyo Institute of Technology
Title: Statistical performance and computational efficiency of low rank tensor estimators
Abstract: In this talk, we consider a problem of estimating a low rank tensor, and discuss statistical properties and computational efficiency of some estimators. There are wide applications of estimating low rank tensors, for example, recommendation system, spatio-temporal data
analysis, and multi-task learning. Several methods have been proposed for this problem. From a statistical view point, a Bayesian approach achieves the mini-max optimal predictive accuracy. On the other hand, a convex approach and an alternating minimization approach are computationally attractive ways. We discuss the trade-off between the statistical performances and the computational efficiency by showing some theoretical and numerical results. We discuss these issued on a finite-dimensional (but high-dimensional) linear model and an infinite-dimensional nonlinear models.
The Institute of Scientific and Industrial Research (ISIR),
Osaka University
Title: Parametric Submodular Minimization in Machine Learning (SLIDES)
Abstract: Submodularity is known to be a discrete counterpart to convexity. Recently, it has been used as the mathematical basis for the development of fast algorithms or the analysis of discrete problems in machine learning. In this talk, I address parametric minimization of submodular set-function in machine learning, which appears in many problems such as structured sparse learning, (minimum average cost) clustering, and densest subgraph problem. I discuss some common properties (combinatorial structures) in these problems, and address efficient algorithms based on parametric submodular minimization using the structures.
Gatsby Computational Neuroscience Unit,
University College London
Title: Improving Approximate Bayesian Inference with Kernel Methods (SLIDES)
Abstract: In this talk, we discuss two ideas of using kernel methods to improve approximate Bayesian inference algorithms.
First, we consider expectation propagation (EP), a widely used message passing algorithm for inferring marginal distributions. In EP, computing an outgoing message from incoming messages requires evaluating a multivariate integral, which is often carried out by an expensive numerical integration. To reduce the computation, we propose learning a kernel-based regression function mapping from incoming to outgoing messages. The use of our proposed two-layer random feature representation of the inputs leads to a fast algorithm that can be trained online during inference.
Next, we consider the problem of inferring an approximate posterior distribution of the model parameters in the setting where evaluation of the likelihood is intractable, whereas generating (pseudo) data given an arbitrary parameter is easy. This is the setting of Approximate Bayesian Computation (ABC) in which the posterior obtained is made of parameters whose corresponding pseudo data are "similar" to the observed data. This similarity measure is typically difficult to define and requires domain experts. We propose using Maximum Mean Discrepancy, a kernel-based distance, to define the similarity. Numerical experiments show that our approach performs no worse than the similarity measure suggested by human experts.
Joint work with: Arthur Gretton, Dino Sejdinovic, Mijung Park, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Zoltán Szabó
Graduate School of Science,
Osaka University
Title: Structure Learning of Bayesian Networks with p Nodes from n Samples when n<<p (SLIDES)
Abstract: We consider learning a Bayesian network with p variables from n tuples of samples. Thus far, it has been believed to be hard to estimate the correct structure when p is large. In this paper, however, we prove an minimum description length structure is obtained in polynominal time with p when n is a constant. The result is consistent with existing results such as D. Chickering who proved that estimating the correct structure is NP hard when n is large. A reason is that the MDL/Bayes seeks an optimal balance between the simplicity of a model and the likelihood of samples w.r.t. the model. The idea is based on the old paper by J. Suzuki (1996) and was taken over and refined by many authors such as J. Tian (2002), C. Campos (2011). This paper is inspired by and improves those works. Based on the novel notion, we propose a branch and bound search to cut complicated models with higher description length and lower posterior probabilities. It is considered that such a Bayesian network structure learning is usefull for the sparse setting (n<<p) such as genome analysis (n=100, p=20000).
The Institute of Scientific and Industrial Research (ISIR)
Osaka University
Talk Title: Non-Gaussian structural equation models for causal discovery (SLIDES)
Abstract: Structural equation models are widely applied in many empirical studies including social sciences, natural sciences, and biomedical sciences. Estimation of structural equation models for continuous variables typically uses covariance structure of data alone and poses serious identifiability problems so that many important models with opposite directions of causation are indistinguishable. In this talk, we will first present an overview of linear non-Gaussian structural equation models and then consider the problem of estimating the causal relations of two observed continuous variables in the presence of hidden common causes. We develop a non-Gaussian approach based on a linear non-Gaussian structural equation model known as LiNGAM and a linear mixed model. This approach does not require us to explicitly model hidden common causes.
Department of Complex and Intelligent Systems
Future University, Hakodate.
Talk Title:
Parameter estimation with Unnormalized model and Homogeneous divergence (SLIDES)
Division of Mathematical Science,
Graduate School of Engineering Science,
Osaka University
Title: Sparse estimation of large covariance matrices
Abstract: This talk consists of two recent works related to the sparse estimation of large covariance matrices. First, we introduce a sparse factor analysis via the penalized maximum likelihood estimation. The lasso-type penalty is imposed on the factor loadings to obtain a sparse loading matrix. We also investigate a relationship between the factor rotation and the penalized maximum likelihood estimation.
Second, we introduce a robust sparse Gaussian graphical modeling. The robust estimation is realized by the gamma-divergence (Fujisawa and Eguchi, 2008, JMVA). The parameter estimation procedure is constructed using the Majorize-Minimization (MM) algorithm, which guarantees that the objective function monotonically decreases at each iteration.
Department of Computer Science and Engineering
Waseda University
Talk Title:
MAP Estimation of Markov Random Fields With Some Applications in Medical Imaging (SLIDES)
Graduate School of Science and Engineering,
Yamagata University
Talk Title: Effective Learning Algorithms for Boltzmann Machines (SLIDES)
Abstract: Boltzmann machine is a type of graphical model based on a Markov random field, and is a fundamental model in the field of deep learning, for example, restricted Boltzmann machine and deep Boltzmann machine. Unfortunately, due to a combinatorial explosion, we cannot perform inference and learning on Boltzmann machines without an approximation. Various approximate techniques for Boltzmann machines have been proposed: advanced mean-field approximation, contrastive divergence, maximum pseudo-likelihood method, maximum composite likelihood method, and so on.
In the first part of this talk, we overview Boltzmann machines and some practical learning algorithms for Boltzmann machines. In the second part of this talk, we tough on some recent developments of learning algorithms for Boltzmann machines.
National Institute of Informatics, Tokyo
Talk Title: Variational saddlepoint approximation for the stochastic block model (SLIDES)
Abstract: The stochastic block model (SBM) is a useful generative model for the analysis of macroscopic structures in graphs. Bayesian inference is the key tool for (i) cluster assignment inference and (ii) model selection for the number of clusters. In this talk, we introduce the behavior of Bayesian inference on the SBM at the large sample limit. Based on the asymptotic expansion of the fully-marginalized log-likelihood, we derive a tractable algorithm solving tasks (i) and (ii) concurrently, which deskills the outer loop for checking all model candidates. Our empirical and theoretical results support that our method is scalable while maintaining the accuracy of model selection, compared to other Bayesian methods such as variational Bayes and information criterion based approaches.
Department of Information Sciences,
Tohoku University
Title: Efficient Learning of Markov Random Fields in Computer Vision (SLIDES)
Abstract: In the field of computer vision, Markov Random Fields (MRFs) are frequently used for inference of unknown parameters in a wide range of problems such as image restoration, denoising, and object recognition. In this talk, we focus on the training parameters of MRFs. While it is known that classification problem of MRFs can be solved efficiently with several sophisticated algorithms like graph cuts, the training of MRFs is time-consuming, and estimating parameters of large-scale MRFs is intractable.
This talk consists of two parts. Firstly, we will explain basic properties of MRFs and the parameter estimation of a Conditional Random Field, which is a kind of MRFs where all the potential functions are conditioned on given samples. Next, we will introduce several practical techniques of how to efficiently train parameters of MRFs from a large amount of training samples, and evaluate these effectiveness.
Institute of Statistical Mathematic, Tokyo.
Talk Title:
Learning Changes in Graphical Model Structure using Density Ratio Estimation Methods (SLIDES)
Abstract: Density ratio estimation methods have been proved efficient in many machine learning applications. In this talk, we explore a few applications of density ratio estimation in graphical model structure learning and their advantages. First, we show that the change of graphical model structure can be estimated directly via such a method without working out individual structure and comparing the difference. Such a method enjoys good empirical performance as well as statistical guarantees requiring sample size only depending on the number of changed edges in a graphical model. Second, we show that the partitioned graphical model can also be accurately learned via a variation of this method. The estimation procedure can be efficiently computed even when the graphical model is not Gaussian, and requires sample size which only depends on the number of "links" between two subgroups. Promising results on real-world applications are also included.