Day 3/Jan 15

Here the time is given as IST (UTC+5:30)/ TST (UTC +8)/ JST (UTC+9).

Session 8. Model selection (10:00-11:00/12:30-13:30/13:30-14:30)

Yoshiyuki Ninomiya, ISM

Some information criteria for semiparametric propensity score analysis

The semiparametric estimation approach, which includes inverse-probability-weighted and doubly-robust estimation using propensity scores, is a standard tool for marginal structural models basically used in causal inference, and is rapidly being extended and generalized in various directions. On the other hand, although model selection is indispensable in statistical analysis, information criterion for selecting an appropriate marginal structure has just started to be developed. In this presentation, based on the original idea of the information criterion, we derive an AIC-type criterion. We define a risk function based on the predictive mean squared error or the Kullback-Leibler divergence as the cornerstone of the information criterion, and treat a general causal inference model that is not necessarily of the type represented as a linear model. The causal effects to be estimated are those in the general population, such as the average treatment effect on the treated or the average treatment effect on the untreated. In numerical experiments, we compare the derived criterion with an existing criterion obtained from a formal argument, and confirm that the former outperforms the latter. Specifically, we confirm that the difference between the estimated structure and the true structure is clearly smaller in all simulation settings, and that the probability of selecting the true or nearly true model is clearly higher.

Soumendu Sundar Mukherjee, ISI

Variable selection under latent group sparsity

In this talk, we will focus on the problem of variable selection in high-dimensional regression under latent group sparsity. We will present a new dynamic penalty that automatically selects variables in groups without being explicitly told what those groups are. This will be done by incorporating into the penalty a suitable Laplacian matrix (containing group information) in the form of a heat flow. At equilibrium the proposed penalty coincides with the traditional group Lasso penalty. We will present some numerical and theoretical results on estimation and variable selection properties of the proposed penalty.

This is an ongoing joint work with Subhroshekhar Ghosh.

Keisuke Yano, ISM

Posterior covariance information criterion

In this presentation, we introduce the Posterior Covariance Information Criterion (PCIC), an information criterion for predictive evaluation based on quasi-Bayesian posterior distributions. PCIC is a natural extension of the Widely Applicable Information Criterion (WAIC) and can be computed in a single MCMC. We demonstrate its application to covariate shift, causal inference, and differentially private learning.

Session 9. Optimization (11:20-12:00/13:50-14:30/14:50-15:30)

Ching-pei Lee, ISSAS

An Inexact Proximal Semismooth Newton Method with Superlinear Convergence to Degenerate Solutions Under the Holderian Error Bound

We describe inexact proximal semismooth Newton methods for monotone inclusion. In the degenerate scenario such that the generalized Jacobian is singular, our method still achieves superlinear convergence to a solution under a Holderian error bound condition. When specialized to the case of regularized optimization, our local convergence rates improve in several respects over recently proposed methods. First, we show that superlinear convergence can be achieved on a broader class of degenerate problems, including some nonconvex ones and those whose smooth term is not twice differentiable. Further, we show that such superlinear rates hold not only for the distance between the iterates and the solution set but also for a certain measure of first-order optimality. We further sharpen analysis for a class of problems previously known to achieve faster-than-quadratic convergence to show that in fact they attain an optimal solution in a finite number of iterations. Finally, for problems that fall in our superlinear convergence scheme, we propose a new algorithm to yield decrease in the objective at each iteration, convergence to an optimal solution, and local Q-superlinear convergence of not only the distance to the solution set and the first-order optimality, but also the objective value, without prior knowledge of problem parameters that may be difficult to estimate.

This is a joint work with Stephen J. Wright.

Mirai Tanaka, ISM

A Gradient Method for Multilevel Optimization

Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the T steepest descent update equations with some prechosen iteration number T. In this paper, we have developed a gradient-based algorithm for multilevel optimization with n levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.

This is joint work with Ryo Sato (UTokyo) and Akiko Takeda (UTokyo/RIKEN).

Poster Session 3. (12:00-12:20/14:30-14:50/15:30-15:50)

Anish Chakrabarty, ISI

On Strong Consistency of Kernel k-Means: A Rademacher Complexity Approach

Identifying the Kernel k-means clustering problem as a risk minimization operation, we show that the sample cluster centers so obtained, converges almost surely to their population counterparts, under the assumption of bounded support on the underlying data distribution. One key highlight of our non-asymptotic approach lies in finding sharp concentration bounds, independent of the data dimension, on the excess clustering risk based on a reinforced form of the Rademacher Complexity of the underlying class of functions. This bound, in turn, enables us to achieve a convergence rate of O(n ^-1/2), keeping the number of clusters fixed. Numerical simulations carried out on both synthetic and real data sets support our theoretical claims of convergence and the rate at which it is attained.

Anurag Dey, ISI

Large sample inference in finite population problems

In this poster, we will first present the comparison of different estimators of finite population mean and its functions in finite-dimensional spaces under several sampling designs. In this context, we will present equivalence classes of estimators for mean and its functions such that any two estimators in the same class have the same asymptotic variance. Next, we will present some asymptotic results related to different sample quantile processes and smooth (e.g., trimmed mean) and non-smooth (e.g., interquartile range) L-estimators under different sampling designs. Finally, we will present some asymptotic results for several estimators of finite population mean, when the data are infinite-dimensional in nature.

Szuhan Lin, ISSAS

Distributed t-SNE

A preliminary understanding of data is always important before further data analysis. However, this task is usually difficult for high-dimensional data. Data visualization is to graphically present data so that the information of data can be easily visualized. Along with the development of data visualization methods, t-Distributed Stochastic Neighbor Embedding (t-SNE) proposed in 2008 by Laurens van der Maaten and Geoffrey Hinton has been the first to successfully separate the 10 digit groups of MNIST data set into 10 clusters, yielding a total of 22,771 citations to date. Nowadays, cryo-electron microscopy (Cryo-EM) is becoming a popular research area due to its ability to do de novo 3D structure determination. Cryo-EM image data are low signal-to-noise ratio data, so we often need huge amounts of data for analysis. But t-SNE suffers from large data set, we proposed a random partition algorithm for t-SNE, which we named distributed t-SNE, and had proved that the result is similar to t-SNE under the same setting through the law of large numbers. We will demonstrate the usage of D-t-SNE on a Cryo-EM image data set.

Qi Zhang, ISM

Bayesian inference for de novo molecular design

The aim of de novo molecular design using machine learning is the identification of promising hypothetical molecules with their synthetic routes, which exhibit a predefined set of physical properties. Here one major difficulty arises from the large search space of synthetic route design; the search space consists of the combinations of millions or more commercially available reactants. Although there are some recent works that focus on the computational design of synthetic routes, there is still less concern about the design of molecules and their synthetic route simultaneously. Focusing on these challenges, in this work, we first formulate the problem of simultaneously designing molecules with desired properties and their synthetic pathways within the framework of Bayesian inference, then develop an exploratory algorithm and software to efficiently solve this hard combinatorial problem on set variables of indefinite size.

This is joint work with Chang Liu (ISM), Stephen Wu (ISM), Ryo Yoshida (ISM)

Session 10. Computation (13:00-14:00/15:30-16:30/16:30-17:30)

Ming-Chung Chang, ISSAS

Predictive Subdata Selection for Computer Models

Tremendous amounts of data are becoming ubiquitous owing to advanced technology. Such data richness, however, may yield an inability to conduct statistical analysis in terms of the time cost. Recently, increased attention has been on solving this data reduction problem, such as the information-based optimal subdata selection (IBOSS) and the Latin hypercube subagging. In this talk, I will introduce a new subdata selection method for computer models. In addition to the geometry of the input space, the proposed method takes advantage of the information of the output values and adaptively updates the current subdata with affordable computational cost. Several numerical examples and real data analyses are provided.

Shyamal Krishna De, ISI

Exact Tests for Offline Change Detection in Multichannel Binary and Count Data

In this work, we first consider offline detection of a single changepoint in binary and Poisson time series. We discuss various approaches such as cumulative sum (CUSUM) and likelihood ratio (LR) based exact tests as well as a new proposal that combines exact conditional two-sample tests with multiplicity correction. We demonstrate empirically that the exact tests are much more powerful than the standard asymptotic tests based on the Brownian bridge approximation of CUSUM statistic, especially when (1) sample size is small, (2) changepoint occurs near boundary, and (3) the data is generated from a distribution with sparse parametric setting. Further, we consider a “multi-channel” changepoint problem where each channel consists of a binary or Poisson time series. Adopting a False Discovery Rate (FDR) controlling mechanism, we detect possible changes in multiple channels and show that this “local” approach is not only more informative but more powerful than multivariate global testing approaches when the number of channels with changepoints is much smaller than the total number of channels. As an application of the proposed methodologies, we finally consider network-valued time-series with (1) edges as binary channels and (2) node-degrees or other local subgraph statistics as count (Poisson) channels. We again observe the advantages of the proposed local approaches over the global testing approaches.

Kengo Kamatani, ISM

Bayesian computation and the curse of dimensionality

Many Markov chain Monte Carlo methods have difficulty with high-dimensional target distributions and it is important to quantify this effect. Over the past 20 years, scale limit analysis has been used to identify this effect. In this talk, we explain some recent results in the analysis of Markov chain Monte Carlo methods.

Closing

Sanghamitra Bandyopadhyay, Director of ISI

Chun-houh Chen, Director of ISSAS

Anil Ghosh, Cordinator from ISI

Frederick Kin Hing Phoa, Coordinator from ISSAS

Hiroe Tsubaki, Director of ISM, closing declaration.

Page updated

Google Sites

Report abuse