Speaker: Yichong Zhang, Department of Economics, Singapore Management University
Abstract: We study how to improve efficiency via regression adjustments with additional covariates under covariate-adaptive randomizations (CARs) when subject compliance is imperfect. We first establish the semiparametric efficiency bound for the local average treatment effect (LATE) under CARs. Second, we develop a general regression-adjusted LATE estimator which allows for parametric, nonparametric, and regularized adjustments. Even when the adjustments are misspecified, our proposed estimator is still consistent and asymptotically normal, and their inference method still achieves the exact asymptotic size under the null. When the adjustments are correctly specified, our estimator achieves the semiparametric efficiency bound. Third, we derive the optimal linear adjustment that leads to the smallest asymptotic variance among all linear adjustments. We then show the commonly used two stage least squares estimator is not optimal in the class of LATE estimators with linear adjustments while Ansel, Hong, and Li's (2018) estimator is. Fourth, we show how to construct a LATE estimator with nonlinear adjustments which is more efficient than those with the optimal linear adjustment. Fifth, we give conditions under which LATE estimators with nonparametric and regularized adjustments achieve the semiparametric efficiency bound. Last, simulation evidence and empirical application confirm efficiency gains achieved by regression adjustments relative to both the estimator without adjustment and the standard two-stage least squares estimator.
Speaker: Muxuan Liang, Department of Biostatistics, University of Florida
Abstract: Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focus on the generalization error, there is a lack of valid inference procedures to identify the driving factors of the estimated decision rule, especially when the surrogate loss is non-differentiable. In this work, we propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations for the linear decision rule estimated using a piece-wise linear surrogate loss, which has a discontinuous gradient and non-regular Hessian. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points and approximate the non-regular Hessian of the surrogate loss. In applications where additional nuisance parameters are involved, we propose a novel cross-fitted version to accommodate flexible nuisance estimates and kernel approximations. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup. Simulation and real data analysis are conducted to demonstrate the validity and the superiority of the proposed method.
Speaker: Brian Gilbert, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Abstract: Research in the past few decades has discussed the concept of "spatial confounding" but has provided conflicting definitions and proposed solutions, some of which do not address the issue of confounding as it is understood in the field of causal inference. We give a clear account of spatial confounding as the existence of an unmeasured confounding variable with a spatial structure. Under certain conditions, including the measurability of the confounder as a function of space, we show that spatial covariates (e.g. latitude and longitude) can be handled as typical covariates by algorithms popular in causal inference to address spatial confounding. We focus on "double machine learning" (DML) by which flexible models are fit for both the exposure and outcome variables to arrive at a causal estimator with favorable convergence properties. These models avoid restrictive assumptions, such as linearity and effect homogeneity, which are present in linear models often employed in spatial statistics and which can lead to strong bias when violated. We demonstrate the advantages of the DML approach analytically and via simulation studies. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
Speaker: Zhanrui Cai, Department of Statistics, Iowa State University
Abstract: Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a genetic dataset dataset, where the high dimensionality makes existing methods hard to apply.
Speaker: Jianing Chu, Taekwon Hong, Joe Zhao, Department of Statistics, North Carolina State University