Day 2/Jan 14

Here the time is given as IST (UTC+5:30)/ TST (UTC +8)/ JST (UTC+9).

Session 5. Dimension reduction (10:00-11:00/12:30-13:30/13:30-14:30)

Arijit Chakrabarti, ISI

On a new model selection criterion for high dimensional PCA

In this talk, we consider the problem of estimating the number of significant components in the context of high dimensional PCA. It is well-known in the literature that using the Akaike Information Criterion (AIC), a consistent estimator can be obtained in this problem. We propose some novel modifications of AIC, whereby we can come up with estimators which are shown to be consistent in this problem and the consistency results are shown to hold under much weaker conditions than what is required for the consistency of the estimator based on AIC.

This talk is based on a joint work with Abhinav Chakraborty and Soumendu Sundar Mukherjee.

Ming-Yueh Huang, ISSAS

Model Selection Among Dimension-Reduced Generalized Cox Models

Conventional semiparametric hazards regression models rely on the specification of particular model formulations, such as proportional-hazards feature and single-index structures. Instead of checking these modeling assumptions one-by-one, I will introduce a class of dimension-reduced generalized Cox models, and then a consistent model selection procedure among this class to select covariates with proportional-hazards feature and a proper model formulation for non-proportional-hazards covariates. In this class, the non-proportional-hazards covariates are treated in a nonparametric manner, and a partial sufficient dimension reduction is introduced to reduce the curse of dimensionality. A semiparametric efficient estimation is proposed to estimate these models. Based on the proposed estimation, we further constructed a cross-validation type criterion to consistently select the correct model among this class. Most importantly, this class of hazards regression models contains the fully nonparametric hazards regression model as the most saturated submodel, and hence no further model diagnosis is required. Overall speaking, this model selection approach is more effective than performing a sequence of conventional model checking. The proposed method is illustrated by simulation studies and a data example.

This is a joint work with Kwun Chuen Gary Chan.

Ci-Ren Jiang, ISSAS

Eigen-Adjusted Functional Principal Component Analysis

Functional Principal Component Analysis (FPCA) has become a widely-used dimension reduction tool for functional data analysis. When additional covariates are available, existing FPCA models integrate them either in the mean function or in both the mean function and the covariance function. However, methods of the first kind are not suitable for data that display second-order variation, while those of the second kind are time-consuming and make it difficult to perform subsequent statistical analyses on the dimension-reduced representations. To tackle these issues, we introduce an eigen-adjusted FPCA model that integrates covariates in the covariance function only through its eigenvalues. In particular, different structures on the covariate-specific eigenvalues -- corresponding to different practical problems -- are discussed to illustrate the model's flexibility as well as utility. To handle functional observations under different sampling schemes, we employ local linear smoothers to estimate the mean function and the pooled covariance function, and a weighted least square approach to estimate the covariate-specific eigenvalues. The convergence rates of the proposed estimators are further investigated under the different sampling schemes. In addition to simulation studies, the proposed model is applied to functional Magnetic Resonance Imaging scans, collected within the Human Connectome Project, for functional connectivity investigation.

This is joint work with Eardi Lila (UW, Seattle), John A.D. Aston (U of Cambridge) and Jane-Ling Wang (UC, Davis).

Session 6. Human data (11:20-12:00/13:50-14:30/14:50-15:30)

Yen-Tsung Huang, ISSAS

Surrogate Marker Assessment Using Mediation and Instrumental Variable Analyses in a Case-Cohort Design

The identification of surrogate markers for gold standard outcomes in clinical trials enables future cost-effective trials that target the identified markers. Due to resource limitations, these surrogate markers may be collected only for cases and for a subset of the trial cohort, giving rise to what is termed the case-cohort design. Motivated by a COVID-19 vaccine trial, we propose methods of assessing the surrogate markers for a time-to-event outcome in a case-cohort design by using mediation and instrumental variable (IV) analyses. In the mediation analysis, we decomposed the vaccine effect on disease risk into an indirect effect (the effect mediated through the surrogate marker), and a direct effect (the effect not mediated by the marker), and we propose that the mediation proportions are surrogacy indices. In the IV analysis, we aimed to quantify the causal effect of the surrogate marker on disease risk in the presence of surrogate--disease confounding, which is unavoidable even in randomized trials. We employed weighted estimating equations derived from nonparametric maximum likelihood estimators (NPMLEs) under semiparametric probit models for the time-to-disease outcome. We plugged in the weighted NPMLEs to construct estimators for the aforementioned causal effects and surrogacy indices, and we determined the asymptotic properties of the proposed estimators. Finite sample performance was evaluated in numerical simulations. We illustrated the utility of the proposed mediation and IV analyses using two data sets from an influenza vaccine trial and from a mock COVID-19 vaccine trial.

This is a joint work with Jih-Chang Yu, Jui-Hsiang Lin and Yi-Ting Huang (all in ISSAS).

Chen-Hsiang Yeang, ISSAS

Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.

This is joint work with Sridevi Padakanti, Khong-Loon Tiong, Yan-Bin Chen, and Chen-Hsiang Yeang

Poster Session 2. (12:00-12:20/14:30-14:50/15:30-15:50)

Mahsa Ashouri, ISSAS

Interactive visualization for time series clusters with domain-relevant attributes

We propose a web-based interactive tool for visualizing the results of clustering large collections of time series with cross-sectional domain-relevant attributes. Such data often arise in Internet-of-Things (IoT) and sensor-based applications, where each time series is coupled with cross-sectional information. While the clustering algorithm in the background is automated, our visualization tool allows users to modify various parameters that lead to different cluster definitions and numbers of clusters. We illustrate the tool by applying it to an air quality dataset (PM2.5 index) collected in different monitoring stations in Taiwan. Our web-based tool, based on R's Shiny App, helps to visualize various characteristics of time series, such as temporal patterns and missing values, as well as clustering attribute groupings.

This is joint work with professors Galit Shmueli (NTHU), Chun-houh Chen (ISSAS), and Frederick Kin Hing Phoa (ISSAS).

Soumya Chakraborty, ISI

Robust Clustering with Normal Mixture Models: A Pseudo β Likelihood Approach

As in other estimation scenarios, likelihood based estimation in the normal mixture set-up is highly non-robust against model misspecification and presence of outliers (apart from being an ill-posed optimization problem). We propose a robust alternative to the ordinary likelihood approach for this estimation problem which performs simultaneous estimation and data clustering and leads to subsequent anomaly detection. To invoke robustness, we follow, in spirit, the methodology based on the minimization of the density power divergence (or alternatively, the maximization of the beta-likelihood) under suitable constraints. An iteratively reweighted least squares approach has been followed in order to compute our estimators for the component means (or equivalently cluster centers) and component dispersion matrices which leads to simultaneous data clustering. Some exploratory techniques are also suggested for anomaly detection, a problem of great importance in the domain of statistics and machine learning. Existence and consistency of the estimators are established under the aforesaid constraints. We validate our method with simulation studies under different set-ups; it is seen to perform competitively or better compared to the popular existing methods like K-means and TCLUST, especially when the mixture components (i.e., the clusters) share regions with significant overlap or outlying clusters exist with small but non-negligible weights. Two real datasets are also used to illustrate the performance of our method in comparison with others along with an application in image processing. It is observed that our method detects the clusters with lower misclassification rates and successfully points out the outlying (anomalous) observations from these datasets.

Minoru Kusaba, ISM

Prediction of stable structures using crystal structure similarity

Prediction of the stable structure of a given chemical composition is a basic and prerequisite task for the discovery of new materials. The major solution to this problem is based on an optimization problem of the free energy which requires a significant computational cost. In this paper, we propose a method which makes crystal structure prediction by selecting crystal structures that are predicted to be similar to the stable structure of a given chemical composition from the existing crystal structures in the database. The prediction of crystal structure similarity is performed by a machine learning model built using prior information about crystal structure similarities in the database. Our method does not require the computationally expensive density functional theory framework, except for the validation part of the suggested structures. The effectiveness and characteristics of our method were demonstrated on a benchmark set.

This is joint work with Chang Liu (ISM) & Ryo Yoshida (ISM).

Jui-Hsiang Lin, ISSAS

Mendelian Randomization for Survival Mediation Analyses

This work aims to investigate how triglyceride level mediates the effect of waist circumference on the incidence of hypertension. Causal mediation analyses have been a popular approach for mechanism investigation. However, its identifiability depends on assumptions of no unmeasured confounding for the exposure-outcome, the mediator-outcome, and the exposure-mediator relations. Burgess et al (2015) have proposed Mendelian randomization for mediation analyses where genetic markers were used as instrumental variables to account for unmeasured confounding. However, the existing work is not readily applicable to our motivating example where the outcome is a time-to-event variable. The lack of MR methodology for mediation analysis (MR-Med) in a survival context motivates our methodology development. We develop an MR-Med approach using a semiparametric linear transformation model for the survival time. This model enjoys linearity and proper causal interpretation, under which we derive estimands for the causal effects on mean transformed survival time and survival probabilities. We then propose estimators for these effects and establish its asymptotic properties. Finite sample performance was evaluated via extensive numerical simulation. The utility of the proposed method was illustrated by analyzing the motivating hypertension study.

This is joint work with Yen-Tsung Huang (ISSAS).

Kei Noba, ISM

On the optimality of the refraction--reflection strategy for Lévy processes

In this talk, we consider de Finetti's optimal dividend problem with capital injection under the assumption that the dividend strategies are absolutely continuous. In many previous studies, the process before being controlled was assumed to be a spectrally one-sided L\'evy process, however in this paper we use a L\'evy process that may have both positive and negative jumps. In the main theorem, we show that a refraction--reflection strategy is an optimal strategy. We also mention the existence and uniqueness of solutions of the stochastic differential equations that define refracted L\'evy processes.

Session 7. Functional data (13:00-14:00/15:30-16:30/16:30-17:30)

Hsin-wen Chang, ISSAS

Empirical likelihood based inference for functional means with application to wearable device data

This paper develops a nonparametric inference framework that is applicable to occupation time curves derived from wearable device data. These curves consider all activity levels within the range of device readings, which is preferable to the practice of classifying activity into discrete categories. A simulation study shows that the proposed procedures outperform competing functional data procedures. We illustrate the proposed methods using wearable device data from an NHANES study.

This is joint work with Ian W. McKeague (Columbia University).

Akifumi Okuno, ISM

Nonparametric Invertible Regression Between Closed Hypercubes

We study a nonparametric invertible regression, which estimates invertible continuous functions between [-1,1]^d. Invertible function estimation is one of the fundamental forms of shape-restricted estimation problems used in various domains, especially for generative models. Whereas the consistency and universality of some estimators have been well developed in this problem, their efficiency has not been fully clarified. In this study, we evaluate a minimax rate of L2 risks for the regression with Lipschitz invertible functions.

This is joint work with Masaaki Imaizumi (U. Tokyo).

Rituparna Sen, ISI

Bayesian Testing of Granger Causality in Functional Time Series

We develop a multivariate functional autoregressive model (MFAR), which captures the cross-correlation among multiple functional time series and thus improves forecast accuracy. We estimate the parameters under the Bayesian dynamic linear models (DLM) framework. In order to capture Granger causality from one FAR series to another, we employ Bayesian Information Criteria (BIC) and compare the results with Deviance Information Criteria (DIC). Motivated by the broad application of functional data in finance, we investigate the causality between the yield curves of two countries. Furthermore, we illustrate a climatology example, examining whether the weather conditions Granger cause pollutant levels in a city.

Page updated

Google Sites

Report abuse