# Research

## Publications

1. "Inference under Covariate-Adaptive Randomization with Imperfect Compliance," with Federico Bugni (2023), Journal of Econometrics, vol. 237 (1). [Paper][Arxiv]

Abstract: This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference on the local average treatment effect (LATE), i.e., the average treatment effect conditional on individuals that always comply with the assigned treatment. As in Bugni et al. (2018, 2019), CAR refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. In contrast to these papers, however, we allow participants of the RCT to endogenously decide to comply or not with the assigned treatment status.

We study the properties of an estimator of the LATE derived from a “fully saturated” instrumental variable (IV) linear regression, i.e., a linear regression of the outcome on all indicators for all strata and their interaction with the treatment decision, with the latter instrumented with the treatment assignment. We show that the proposed LATE estimator is asymptotically normal, and we characterize its asymptotic variance in terms of primitives of the problem. We provide consistent estimators of the standard errors and asymptotically exact hypothesis tests. In the special case when the target proportion of units assigned to each treatment does not vary across strata, we can also consider two other estimators of the LATE, including the one based on the “strata fixed effects” IV linear regression, i.e., a linear regression of the outcome on indicators for all strata and the treatment decision, with the latter instrumented with the treatment assignment.

Our characterization of the asymptotic variance of the LATE estimators in terms of the primitives of the problem allows us to understand the influence of the parameters of the RCT. We use this to propose strategies to minimize their asymptotic variance in a hypothetical RCT based on data from a pilot study. We illustrate the practical relevance of these results using a simulation study and an empirical application based on Dupas et al. (2018).

2. "Uniform Nonparametric Inference for Time Series using Stata," with Jia Li and Zhipeng Liao (2020), The Stata Journal, vol. 20, pp. 706-720. [Paper]

## Working Papers

Abstract: This paper investigates the identification and inference of treatment effects in randomized controlled trials with social interactions. Two key network features characterize the setting and introduce endogeneity: (1) latent variables may affect both network formation and outcomes, and (2) the intervention may alter network structure, mediating treatment effects. I make three contributions. First, I define parameters within a post-treatment network framework, distinguishing direct effects of treatment from indirect effects mediated through changes in network structure. I provide a causal interpretation of the coefficients in a linear outcome model. For estimation and inference, I focus on a specific form of peer effects, represented by the fraction of treated friends. Second, in the absence of endogeneity, I establish the consistency and asymptotic normality of ordinary least squares estimators. Third, if endogeneity is present, I propose addressing it through shift-share instrumental variables, demonstrating the consistency and asymptotic normality of instrumental variable estimators in relatively sparse networks. For denser networks, I propose a denoised estimator based on eigendecomposition to restore consistency. Finally, I revisit Prina (2015) as an empirical illustration, demonstrating that treatment can influence outcomes both directly and through network structure changes.

Abstract: Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.

3. "Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance," with Federico Bugni, Filip Obradovic, and Amilcar Velez. Submitted. [Arxiv]

Abstract: Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment.

We first develop characterizations of the identified sets for both estimands. Since data are generally not i.i.d. under CAR, these characterizations do not follow from existing results. We then provide consistent estimators of the identified sets and asymptotically valid confidence intervals for the parameters. Our asymptotic analysis leads to concrete practical recommendations regarding how to estimate the treatment assignment probabilities that enter in estimated bounds. In the case of the ATE, using sample analog assignment frequencies is more efficient than using the true assignment probabilities. On the contrary, using the true assignment probabilities is preferable for the ATT.

4. "On the Power Properties of Inference for Parameters with Interval Identified Sets," with Federico Bugni, Filip Obradovic, and Amilcar Velez. Submitted. [Arxiv]

Abstract: This paper studies the power properties of confidence intervals (CIs) for a partially-identified parameter of interest with an interval identified set. We assume the researcher has bounds estimators to construct the CIs proposed by Stoye (2009), referred to as CI1, CI2, and CI3. We also assume that these estimators are "ordered": the lower bound estimator is less than or equal to the upper bound estimator.

Under these conditions, we establish two results. First, we show that CI1 and CI2 are equally powerful, and both dominate CI3. Second, we consider a favorable situation in which there are two possible bounds estimators to construct these CIs, and one is more efficient than the other. One would expect that the more efficient bounds estimator yields more powerful inference. We prove that this desirable result holds for CI1 and CI2, but not necessarily for CI3.

## Work in Progress

Abstract: For analytic convenience, existing statistical theories either assume random or fixed regressors. Consequently, they do not cover the practical case of estimating the average treatment effect in experiments with randomized treatments and non-randomized, fixed pretreatment covariates. To fill the gap, we develop the theory for regressions with mixed regressors that contain both random and non-random, fixed components. Importantly, our theory allows for misspecification of the regression functions. We start with the canonical least-squares regression, discussing the interpretation of the regression coefficients and the estimation of the standard errors. We then develop the theory for estimating equations, which covers the canonical example of the two-stage least-squares estimation. We start with the theory for independent data and also extend the discussion to clustered data.