Here the time is given as IST (UTC+5:30)/ TST (UTC +8)/ JST (UTC+9).
Arijit Chakrabarti, ISI
On a new model selection criterion for high dimensional PCA
In this talk, we consider the problem of estimating the number of significant components in the context of high dimensional PCA. It is well-known in the literature that using the Akaike Information Criterion (AIC), a consistent estimator can be obtained in this problem. We propose some novel modifications of AIC, whereby we can come up with estimators which are shown to be consistent in this problem and the consistency results are shown to hold under much weaker conditions than what is required for the consistency of the estimator based on AIC.
This talk is based on a joint work with Abhinav Chakraborty and Soumendu Sundar Mukherjee.
Ming-Yueh Huang, ISSAS
Model Selection Among Dimension-Reduced Generalized Cox Models
Conventional semiparametric hazards regression models rely on the specification of particular model formulations, such as proportional-hazards feature and single-index structures. Instead of checking these modeling assumptions one-by-one, I will introduce a class of dimension-reduced generalized Cox models, and then a consistent model selection procedure among this class to select covariates with proportional-hazards feature and a proper model formulation for non-proportional-hazards covariates. In this class, the non-proportional-hazards covariates are treated in a nonparametric manner, and a partial sufficient dimension reduction is introduced to reduce the curse of dimensionality. A semiparametric efficient estimation is proposed to estimate these models. Based on the proposed estimation, we further constructed a cross-validation type criterion to consistently select the correct model among this class. Most importantly, this class of hazards regression models contains the fully nonparametric hazards regression model as the most saturated submodel, and hence no further model diagnosis is required. Overall speaking, this model selection approach is more effective than performing a sequence of conventional model checking. The proposed method is illustrated by simulation studies and a data example.
This is a joint work with Kwun Chuen Gary Chan.
Ci-Ren Jiang, ISSAS
Eigen-Adjusted Functional Principal Component Analysis
Functional Principal Component Analysis (FPCA) has become a widely-used dimension reduction tool for functional data analysis. When additional covariates are available, existing FPCA models integrate them either in the mean function or in both the mean function and the covariance function. However, methods of the first kind are not suitable for data that display second-order variation, while those of the second kind are time-consuming and make it difficult to perform subsequent statistical analyses on the dimension-reduced representations. To tackle these issues, we introduce an eigen-adjusted FPCA model that integrates covariates in the covariance function only through its eigenvalues. In particular, different structures on the covariate-specific eigenvalues -- corresponding to different practical problems -- are discussed to illustrate the model's flexibility as well as utility. To handle functional observations under different sampling schemes, we employ local linear smoothers to estimate the mean function and the pooled covariance function, and a weighted least square approach to estimate the covariate-specific eigenvalues. The convergence rates of the proposed estimators are further investigated under the different sampling schemes. In addition to simulation studies, the proposed model is applied to functional Magnetic Resonance Imaging scans, collected within the Human Connectome Project, for functional connectivity investigation.
This is joint work with Eardi Lila (UW, Seattle), John A.D. Aston (U of Cambridge) and Jane-Ling Wang (UC, Davis).
Yen-Tsung Huang, ISSAS
Surrogate Marker Assessment Using Mediation and Instrumental Variable Analyses in a Case-Cohort Design
The identification of surrogate markers for gold standard outcomes in clinical trials enables future cost-effective trials that target the identified markers. Due to resource limitations, these surrogate markers may be collected only for cases and for a subset of the trial cohort, giving rise to what is termed the case-cohort design. Motivated by a COVID-19 vaccine trial, we propose methods of assessing the surrogate markers for a time-to-event outcome in a case-cohort design by using mediation and instrumental variable (IV) analyses. In the mediation analysis, we decomposed the vaccine effect on disease risk into an indirect effect (the effect mediated through the surrogate marker), and a direct effect (the effect not mediated by the marker), and we propose that the mediation proportions are surrogacy indices. In the IV analysis, we aimed to quantify the causal effect of the surrogate marker on disease risk in the presence of surrogate--disease confounding, which is unavoidable even in randomized trials. We employed weighted estimating equations derived from nonparametric maximum likelihood estimators (NPMLEs) under semiparametric probit models for the time-to-disease outcome. We plugged in the weighted NPMLEs to construct estimators for the aforementioned causal effects and surrogacy indices, and we determined the asymptotic properties of the proposed estimators. Finite sample performance was evaluated in numerical simulations. We illustrated the utility of the proposed mediation and IV analyses using two data sets from an influenza vaccine trial and from a mock COVID-19 vaccine trial.
This is a joint work with Jih-Chang Yu, Jui-Hsiang Lin and Yi-Ting Huang (all in ISSAS).
Chen-Hsiang Yeang, ISSAS
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.
This is joint work with Sridevi Padakanti, Khong-Loon Tiong, Yan-Bin Chen, and Chen-Hsiang Yeang
Hsin-wen Chang, ISSAS
Empirical likelihood based inference for functional means with application to wearable device data
This paper develops a nonparametric inference framework that is applicable to occupation time curves derived from wearable device data. These curves consider all activity levels within the range of device readings, which is preferable to the practice of classifying activity into discrete categories. A simulation study shows that the proposed procedures outperform competing functional data procedures. We illustrate the proposed methods using wearable device data from an NHANES study.
This is joint work with Ian W. McKeague (Columbia University).
Akifumi Okuno, ISM
Nonparametric Invertible Regression Between Closed Hypercubes
We study a nonparametric invertible regression, which estimates invertible continuous functions between [-1,1]^d. Invertible function estimation is one of the fundamental forms of shape-restricted estimation problems used in various domains, especially for generative models. Whereas the consistency and universality of some estimators have been well developed in this problem, their efficiency has not been fully clarified. In this study, we evaluate a minimax rate of L2 risks for the regression with Lipschitz invertible functions.
This is joint work with Masaaki Imaizumi (U. Tokyo).
Rituparna Sen, ISI
Bayesian Testing of Granger Causality in Functional Time Series
We develop a multivariate functional autoregressive model (MFAR), which captures the cross-correlation among multiple functional time series and thus improves forecast accuracy. We estimate the parameters under the Bayesian dynamic linear models (DLM) framework. In order to capture Granger causality from one FAR series to another, we employ Bayesian Information Criteria (BIC) and compare the results with Deviance Information Criteria (DIC). Motivated by the broad application of functional data in finance, we investigate the causality between the yield curves of two countries. Furthermore, we illustrate a climatology example, examining whether the weather conditions Granger cause pollutant levels in a city.