Talks (Slides PDFs!)

Here is a list of confirmed invited speakers:

Click to jump to: Arindam Banerjee, Mladen Kolar, Jinchi Lv, Arthur Gretton, Mathias Drton, Wittawat Jitkrittum, Taiji Suzuki, Makoto Yamada, Hiroaki Sasaki, Mathieu Blondel, Emtiyaz Khan, Benjamin Poignard, Song Liu,

Arindam Banerjee

University of Minnesota, Twin Cities

Title: Beyond Sparsity: Finite Sample Learning of General Structured Models [PDF]

Abstract: Many machine learning problems, especially scientific problems in areas such as climate science, ecology, and brain sciences, operate in the `small samples, high dimensions' regime, i.e., they have

numerous possible predictors or features, but the number of training samples is small. In this talk, we will discuss recent advances in general formulations and estimators for such problems. These formulations generalize prior work such as the Lasso and the Dantzig selector, designed for sparse models, to general structured models where the structure is induced by a suitable norm. We will discuss the geometry underlying such formulations, and how the geometry helps in establishing finite sample properties of the estimators. We will discuss applications of the general approach to detecting changes in dependency structure of graphical models. We will also briefly discuss applications of such results to multivariate time series modeling, superposition models, and a real

world application in climate science.

This is joint work with Sheng Chen, Farideh Fazayeli, Andre Goncalves, Igor Melnyk, Pradeep Ravikumar, and Vidyashankar Sivakumar.

Mladen Kolar

The University of Chicago

Booth School of Business

Title: Post-Regularization Inference for Dynamic Nonparanormal Graphical Models [PDF]

Abstract: We propose a novel class of dynamic nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic. The estimator consistently estimates the latent inverse Pearson correlation matrix uniformly in both index variable and kernel bandwidth. Its rate of convergence is shown to be minimax optimal. Thorough numerical simulations and an application to a neural imaging dataset support the usefulness of our method.

Joint work with Junwei Lu and Han Liu.

Jinchi Lv

Data Sciences and Operations Department

Marshall School of Business

University of Southern California

Title: Tuning-Free Heterogeneity Pursuit in Massive Networks [PDF]

Abstract: Heterogeneity is often natural in many contemporary applications involving massive data. While posing new challenges to effective learning, it can play a crucial role in powering meaningful scientific discoveries through the understanding of important differences among subpopulations of interest. In this paper, we exploit multiple networks with Gaussian graphs to encode the connectivity patterns of a large number of features on the subpopulations. To uncover the heterogeneity of these structures across subpopulations, we suggest a new framework of tuning-free heterogeneity pursuit (THP) via large-scale inference, where the number of networks is allowed to diverge. In particular, two new tests, the chi-based test and the linear functional-based test, are introduced and their asymptotic null distributions are established. Under mild regularity conditions, we establish that both tests are optimal in achieving the testable region boundary and the sample size requirement for the latter test is minimal. Both theoretical guarantees and the tuning-free feature stem from efficient multiple-network estimation by our newly suggested approach of heterogeneous group square-root Lasso (HGSL) for high-dimensional multi-response regression with heterogeneous noises. To solve this convex program, we further introduce a tuning-free algorithm that is scalable and enjoys provable convergence to the global optimum. Both computational and theoretical advantages of our procedure are elucidated through simulation and real data examples. This is a joint work with Yingying Fan, Yongjian Kang and Zhao Ren.

Link: http://www-bcf.usc.edu/~jinchilv/publications/THP-RKFL16.pdf

Arthur Gretton

Gatsby Computational Neuroscience Unit

University College London

Title: Learning Interpretable Features to Compare Distributions [PDF]

Abstract: I will present adaptive two-sample tests with maximum testing power and interpretable features, using two divergence measures: the maximum mean discrepancy (MMD), and differences of learned smooth features (the mean embedding (ME) test, NIPS 2016). In both cases, the key point is that variance matters: it is not enough to have a large empirical divergence; we also need to have high confidence in the value of our divergence. These interpretable tests can be used in benchmarking and troubleshooting generative models. For instance, we may detect subtle differences in the distribution of model outputs and real hand-written digits which humans are unable to find (for instance, small imbalances in the proportions of certain digits, or minor distortions that are implausible in normal handwriting). We use the linear-time ME test to distinguish positive and negative emotions on a facial expression database, showing that a distinguishing feature reveals the facial areas most relevant to emotion.

Mathias Drton

University of Washington

Department of Statistics

Title: Regularized score matching for graphical models: Non-Gaussianity and missing data [PDF]

Abstract: I will discuss a convenient estimation framework for estimating high-dimensional graphical models based on regularizing Hyvärinen's score matching loss. This framework avoids the need to know the partition function, which greatly simplifies estimation in non-Gaussian models. I will present results for specific examples of such non-Gaussian models and show how the framework readily accommodates incomplete data.

This is joint work with Lina Lin and Ali Shojaie

Wittawat Jitkrittum

Gatsby Computational Neuroscience Unit, University College London

Title: An Adaptive Test of Independence with Analytic Kernel Embeddings [PDF]

Abstract: A new computationally efficient dependence measure, and an adaptive statistical test of independence, are proposed. The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features). These features are chosen so as to maximize a lower bound on the test power, resulting in a test that is data-efficient, and that runs in linear time (with respect to the sample size n). The optimized features can be interpreted as evidence to reject the null hypothesis, indicating regions in the joint domain where the joint distribution and the product of the marginals differ most. Consistency of the independence test is established, for an appropriate choice of features. In real-world benchmarks, independence tests using the optimized features perform comparably to the state-of-the-art quadratic-time HSIC test, and outperform competing O ( n ) and O ( n log n ) tests.

Taiji Suzuki

Department of Mathematical and Computing Sciences

Graduate School of Information Science and Engineering

Tokyo Institute of Technology

Title: Generalization error bound of Bayesian deep learning: a kernel perspective [PDF]

Abstract: In this talk,we discuss the generalization error of Bayesian deep learning. To derive the generalization error bound, we consider an integral form as the target to approximate. Based on this, we derive the finite dimensional approximation error by constructing a reproducing kernel Hilbert space on each layer. Moreover, we construct a Bayes estimator to estimate the finite dimensional approximation and derive its estimation error. As a result, a bias-variance trade-off for the determination of the widths of the internal layers is obtained. Finally, we discuss implications that are drawn from the theoretical result.

Makoto Yamada

Institute for Chemical Research, Kyoto University

Title: Nonlinear Feature Selection for High-Dimensional Data [PDF]

Abstract: Feature selection is an important machine learning problem, and it is widely used for various types of applications such as gene selection from microarray data, document categorization, and prosthesis control, to name a few. The feature selection problem is a fundamental and traditional machine learning problem, and thus there exist many methods including the least absolute shrinkage and selection operator (Lasso). However, there are a few methods that can select features from large and ultra high-dimensional data (more than million features) in nonlinear way. In this talk, we first introduce a Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) that can efficiently select non-redundant features from a small and high-dimensional data in nonlinear way. A key advantage of HSIC Lasso is that it is a convex method and can find a globally optimal solution. Then we further extend the proposed method to handle ultra high-dimensional data by incorporating with distributed computing framework. Moreover, we introduce two newly proposed algorithms the localized lasso and hsicInf, where the localized lasso is useful for selecting a set of features from each sub-cluster and hsicInf can obtain p-values of selected features from any type of data.

Hiroaki Sasaki

Graduate School of Information Science

Nara Institute of Science and Technology

Title: Simultaneous Estimation of Non-Gaussian Components and their Correlation Structure [PDF]

The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would be very useful to estimate the dependency structure from data. While such models have been proposed, they usually concentrate only on higher-order correlations such as energy (square) correlations, ignoring liner correlations. Yet, linear correlations are a fundamental and informative form of dependency in many real data sets. Linear correlations are usually completely removed by ICA and related methods, so they can only be analyzed by developing a new method which explicitly allows for linearly correlated components. In this talk, we propose a probabilistic model of linear non-Gaussian components which are allowed to have both linear and higher-order correlations. The precision matrix of the linear components is assumed to be randomly generated by a higher-order process and explicitly parametrized by a parameter matrix. The estimation of the parameter matrix is shown to be particularly simple because using score matching, the objective function is a quadratic form. First, using simulations with artificial data, we show that the proposed method is able to estimate non-Gaussian components and their correlation structure simultaneously. Then, some demonstration on real data sets is also provided.

https://arxiv.org/abs/1506.05666

Mathieu Blondel

NTT Communication Science Laboratories

Title: Recent advances on polynomial neural networks and related models [PDF]

Abstract: Over the past few years, neural networks have attracted a lot of attention, due to their success in numerous practical applications. In this talk, I will present a recently-developed form of neural network called polynomial network. Compared to traditional sigmoidal and ReLu networks, polynomial networks have two main advantages : theoretically-guaranteed training and interpretability. This talk will cover recent works by myself and other authors on polynomial networks, and related models such as factorization machines.

Emtiyaz Khan

RIKEN Center for Advanced Intelligence Project

RIKEN

Title: Conjugate-computation variational inference for approximate Bayesian inference in non-conjugate exponential-family models [PDF]

Abstract: Approximate Bayesian Inference in large and complex models is computationally challenging. Recently, there is an explosion of work exploring stochastic gradient (SG) methods for variational inference. These methods are widely applicable and can scale to huge datasets, but they do not always lead to efficient and modular updates. In this talk, I will focus on one such case: the non-conjugate exponential-family models. I will present a new method called Conjugate-computation Variational Inference (CVI) for inference in these models. CVI uses ideas from stochastic proximal-gradient methods and converts the problem into a sequence of variational inferences in conjugate models. I will show that CVI is not only as general as SG based approaches, but is also modular, scalable, and convergent. CVI also enables easy implementation by reusing the existing software available for conjugate models.

Benjamin Poignard

Applied Mathematics, Center for Research in Economics and Statistics (CREST) and Paris Dauphine University (CEREMADE)

Title: Dynamic correlation processes based on graphical vines (Joint work with J.D Fermanian, CREST) [PDF]

Abstract: We develop a new method for generating dynamics of conditional correlation matrices. These correlation matrices will be parameterized by a subset of their partial correlations, whose structure are described by an undirected graph called vine. Since such partial correlation processes can be specified separately, our approach provides very flexible and potentially parsimonious multivariate processes. By generating univariate dynamics of partial correlations independently, we obtain sequences of correlation matrices without any normalization stage. Parsimony is fostered as one can set constraints at any level of the vine tree without altering other correlations. We introduce the so-called Vine-GARCH class of processes and describe a quasi-maximum likelihood estimation procedure. Compared to other usual techniques, particularly for the DCC family, inference is simpler and can be led equation per equation. We provide conditions for the existence and the uniqueness of strictly stationary solutions of the Vine-GARCH process. The proof is based on Tweedie’s (1988) criteria, after rewriting the Vine-GARCH process as a nonlinear Markov chain. Moreover, we study the existence of their finite moments and discuss the tightness of our sufficient conditions. Furthermore, the proposed Vine-GARCH dynamics are estimated by the quasi-maximum likelihood method. We prove the weak consistency and asymptotic normality of the quasi-maximum likelihood estimator obtained in a two-step procedure. We compare our models with some DCC-type specifications through some simulated experiments.​

Song Liu

The Institute of Statistical Mathematics

Title: Recent Developments on Learning Changes between Graphical Models [PDF]

Abstract: Recent years have seen an increasing popularity of learning the sparse

changes in Markov Networks. Changes in the structure of Markov Networks

reflect alternations of interactions between random variables under

different regimes and provide insights into the underlying system. While

each individual network structure can be complicated and difficult to

learn, the overall change from one network to another can be simple. This intuition gave birth to an approach that directly learns the sparse changes without modelling and learning the individual (possibly dense) networks. In this paper, we review such a direct learning method with some latest developments along this line of research.

Keywords: Markov Network, Density Ratio Estimation, Change Detection