Speaker: Guanhua Chen, PhD, Assistant Professor, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
Abstract: Health care costs in the United States have continued to rise despite little meaningful improvement in health outcomes. A significant portion of these health care costs come from hospital spending. To help improve patient outcomes while reducing the cost of care, health care systems have put on a lot of effort into developing innovative interventions/treatments. Thus, determining target patients deemed likely to benefit from the innovative treatments is a key interest. In other words, we want to identify individualized treatment rules (ITRs) for targeted enrollment of patients. In this paper, we use the health care payments (which is a significant component of health care utilization) as our primary outcome. The problem is challenging such that health care payments often follow a mixture distribution with many patients incurring little to no payments over a given period of time and some patients incurring large costs: i.e. semi-continuous. We develop a general framework for estimation of ITRs based on a two-part modeling, where models are estimated for the zero part of the outcome and the positive part separately. To improve estimation performance in high dimensions, we leverage a scientifically-plausible penalty that encourages the signs of coefficients for each variable to agree between the two models. We also develop a highly efficient algorithm for computation. We demonstrate the effectiveness of our approach in simulated examples and in a study of two treatment plans of complex case management in a major health system.
Ref: A Two-Part Framework for Estimating Individualized Treatment Rules From Semicontinuous Outcomes
Speaker: Peisong Han, PhD, Associate Professor, Department of Biostatistics, University of Michigan
Abstract: It is common to have access to summary information from external studies. Such information can be useful in model building for an internal study of interest and can improve parameter estimation efficiency when incorporated. However, external studies may target populations different from the internal study, in which case an incorporation of the corresponding summary information may introduce estimation bias. We develop a method that selects the external studies whose target population is the same as the internal study and simultaneously incorporates their available information into estimation. The resulting estimator has the efficiency as if we knew which external studies target the same population and made use of information from those studies alone. The method is applied to a prostate cancer study to incorporate external summary information to improve parameter estimation.
Speaker: Jianing Chu, Department of Statistics, North Carolina State University
Abstract: Personalized decision-making, an artificial intelligence paradigm tailored to an individual's characteristics, has recently attracted a great deal of attention. Given data with individual covariates, treatment assignments and outcomes, policy makers best individualized treatment rule (ITR) that maximizes the expected outcome, known as the value function. Many existing methods assume that the training and testing distributions are the same. However, a covariate shift ofen exists in real-world applications and the estimated optimal ITR may have poor generalizability when the training and testing distributions are not identical. In this paper, we propose novel doubly robust estimators for the value function of an artificial distribution over a class ITRs, by leveraging the external aggregate information with individual-level testing data. We establish large sample properties of the proposed estimators. We conduct numerical experiments to show the effectiveness of the proposed estimators.
Speaker: Ming-Yueh Huang, PhD, Institute of Statistical Science, Academia Sinica
Abstract: In the exploratory data analysis, dimension reduction has been widely used to characterize potentially low-dimensional structures of the relationship between a response and a bunch of covariates. Different from popular machine learning methods, we regard the sufficient dimension reduction as a model selection among a series of nested semiparametric multi-index distribution regression models. In this approach, we develop consistent estimation that can estimate the index coefficients and the structural dimension simultaneously. Statistical inference for the parameters of interest is also established.
Paper (optional): Huang and Chiang (2017). An effective semiparametric estimation approach for the sufficient dimension reduction model. Journal of American Statistical Association. 113, No. 519, 1296–1310.
Speaker: Yunshu Zhang, Department of Statistics, North Carolina State University
Abstract: Unlike in randomized clinical trials (RCTs), confounding control is critical for estimating the causal effects from observational studies due to the lack of treatment randomization. Under the unconfoundedness assumption, matching methods are popular because they can be used to emulate an RCT that is hidden in the observational study. To ensure the key assumption hold, the effort is often made to collect a large number of possible confounders, rendering dimension reduction imperative in matching. Three matching schemes based on the propensity score (PSM), prognostic score (PGM), and double score (DSM, i.e. the collection of the first two scores) have been proposed in the literature. However, a comprehensive comparison is lacking among the three matching schemes and has not made inroads into the best practices including variable selection, choice of caliper and replacement. In this article, we explore the statistical and numerical properties of PSM, PGM, and DSM via extensive simulations. Our study supports that DSM performs favorably with, if not better than, the two single score matching in terms of bias and variance. In particular, DSM is doubly robust in the sense that the matching estimator is consistent requiring either the propensity score model or the prognostic score model is correctly specified. Variable selection on the propensity score model and matching with replacement is suggested for DSM, and we illustrate the recommendations with comprehensive simulation studies. An R package is available at https://github.com/Yunshu7/dsmatch.
Speaker: Lili Wu, Department of Statistics, North Carolina State University
Abstract: Individualized treatment effects lie at the heart of precision medicine. The gold-standard approach to estimating the conditional treatment effect or individualized treatment rules (ITRs) is randomized experiments, where subjects are randomized to different treatment groups and the bias is minimized to the extent possible. However, experimental data
are limited in external validity because of their selection restrictions and therefore are not representative of the target real-world population; Besides, experimental data usually has small sample size due to the expensive costs from the collecting procedure. On the other hand, real-world data (RWD) are becoming popular and provide a representative sample of the population, but with unmeasured confounding issues usually. In this talk, I will discuss how to learn the optimal interpretable ITRs which can be generalizable to the target population, and how to gain the more efficient conditional treatment effect estimators by integrating the information from both the experimental and observational data.
Speaker: Wang Miao, PhD, Assistant Professor of Statistics, Department of Probability and Statistics and Center for Statistical Sciences, Peking University
Abstract: Nonignorable missingness arises in real-world data and jeopardizes statistical inference. It has been recognized that the data collection process may be useful for adjustment for missingness. Paradata, which are the records tracking the collection process of survey data, are often available in nowadays surveys, however, have not been used widely in statistical analysis until recently. Paradata typically contains information about call back, the length of interviews, and the reluctance of the interviewee, etc. In this paper, we establish a framework for nonignorable missing data analysis with the callback information. In particular, we establish identification, doubly robust estimation, and semiparametric efficiency theory under a stableness of resistance assumption stating that resistance to respond caused by the missing variables remains the same across different contact attempts. Our results generalize previous parametric approaches by allowing for nonparametric models and proposing several semiparametric estimators including doubly robust ones that are consistent and asymptotically normal even if the working models are partially misspecified. The proposed approach is illustrated with both simulations and applications to Consumer Expenditure Surveys as well as to the National Longitudinal Survey of Young Men.
Speaker: Xiaojun Mao, PhD, Assistant Professor in School of Data Science, Fudan University
Abstract: In recent years, recovering a low-rank data matrix from relatively few observed entries has drawn a significant amount of attention. The low-rank assumption is often used to reflect the belief that rows or columns are generated from relatively few numbers of hidden factors. In this talk, I will first discuss the problem of matrix completion from corrupted data when additional covariates are available. We consider a column-space-decomposition model together with missing at random setting. Secondly, a special low-rank missing mechanism is also studied for matrix completion. More generally, we investigate matrix completion with model-free weightings.
Speaker: Jiayi Wang , Department of Statistics, Texas A&M University
Abstract: Multidimensional function data arise from many fields nowadays. The covariance fun
ction plays an important role in the analysis of such increasingly common data. In this article, we propose a novel nonparametric covariance function estimation approach under the framework of reproducing kernel Hilbert spaces (RKHS) that can handle both sparse and dense functional data. We extend multilinear rank structures for (finite-dimensional) tensors to functions, which allow for flexible modeling of both covariance operators and marginal structures. The proposed framework can guarantee that the resulting estimator is automatically semi-positive definite, and can incorporate various spectral regularizations. The trace-norm regularization in particular can promote low ranks for both covariance operator and marginal structures. Despite the lack of a closed form, under mild assumptions, the proposed estimator can achieve unified theoretical results that hold for any relative magnitudes between the sample size and the number of observations per sample field, and the rate of convergence reveals the phase-transition phenomenon from sparse to dense functional data. Based on a new representer theorem, an ADMM algorithm is developed for the trace-norm regularization. The appealing numerical performance of the proposed estimator is demonstrated by a simulation study and the analysis of a dataset from the Argo project. Supplementary materials for this article are available online.
Paper (optional): https://www.tandfonline.com/doi/abs/10.1080/01621459.2020.1820344
Speaker: Joe Zhao, Department of Statistics, North Carolina State University
Abstract: We propose the outcome-adjusted balance measure to perform model selection for the generalized propensity score (GPS), which serves as an essential component in estimation of the pairwise average treatment effects (ATEs) in observational studies with more than two treatment levels. The primary goal of the balance measure is to identify the GPS model specification such that the resulting ATE estimator is consistent and efficient. Following recent empirical and theoretical evidence, we establish that the optimal GPS model should only include covariates related to the outcomes. Given a collection of candidate GPS models, the outcome-adjusted balance measure imputes all baseline covariates by matching on each candidate model, and selects the model that minimizes a weighted sum of absolute mean differences between the imputed and original values of the covariates. The weights are defined to leverage the covariate-outcome relationship, so that GPS models without optimal variable selection are penalized. Under appropriate assumptions, we show that the outcome-adjusted balance measure consistently selects the optimal GPS model, so that the resulting GPS matching estimator is asymptotically normal and efficient. We compare its finite sample performance with existing measures in a simulation study. We illustrate an application of the proposed methodology in the analysis of the Tutoring data.
Speaker: Chenyin Gao, Department of Statistics, North Carolina State University
Abstract: Deep neural networks have been widely used for image denoising throughout the years. However, most of the deep networks need to be trained on a large scale of clean data. Therefore, it poses a burden on the data acquisition process and sometimes the clean images are inaccessible. Motivated by this problem, we propose a new unsupervised denoising algorithm, which combines the auto-encoder structure (e.g. U-net) with a low-rank tensor approximation technique. Also, by incorporating a traditional Gaussian denoiser (e.g. BM3D), we are able to train our network based on a single image, which significantly reduces the burden of data collection. Extensive experiments both on benchmark and real-world datasets have been carried out for possible comparisons. The results mostly show that our proposed method outperforms the existing single-image unsupervised denoisers and it can achieve comparable performances with some supervised networks such as DnCNN.
Speaker: Siyi Liu, Department of Statistics, North Carolina State University
Abstract: Missing data is inevitable in longitudinal clinical trials, and outcomes are not always normally distributed. In the presence of outliers or other deviations from normality, the conventional mixed model with repeated measures (MMRM) analysis of the treatment effect based on the multivariate normal assumption may produce bias and power loss. Control-based imputation (CBI) is an approach for estimating the treatment effect under the hypothetical condition that patients who drop out have the same profile as those in the control group.
We developed several multiple imputation (MI) and/or analytic-based robust CBI approaches, all of which utilize robust regression based on Huber's M-estimation or rank-based regression to handle non-normality. The finite-sample performance of the robust inference, including Type-I error control power, and confidence interval coverage, is assessed by comprehensive simulation studies. Based on the statistical and computational performance from the simulations, the analytic-based robust approach followed by a delta method approximation or bootstrap variance estimation is recommended.