Manuscripts are available upon request.
<Randomization Tests for Distributions of Individual Treatment Effects using Multiple Rank Statistics>, David Kim, Yongchang Su, Jake Bowers, Xinran Li
Understanding treatment effect heterogeneity has become increasingly important in many fields. In this paper we study distributions and quantiles of individual treatment effects to provide a more comprehensive understanding of treatment effects beyond usual averages by implementing randomization inference. Recent randomization-based approaches present finite-sample valid inference for treatment effect quantiles, but their performance depends critically on the choice of rank-based test statistics, specifically the rank transformation used in the rank sum statistics, where the optimal choice of such transformation depends on the underlying unknown distributions of potential outcomes. To overcome this issue, we develop inference procedures that adaptively combine multiple rank statistics in a data-driven manner, while preserving finite-sample validity. We propose a combined test statistic that maximize over several rank statistics after appropriate standardization in both completely randomized and stratified randomized experiments. Specifically, in stratified randomized experiment we discover various weighting and aggregation methods on multiple statistics, and formulate the problem as a linear (integer) programming problem, which can be solved efficiently in practice. We further explore the performance of the proposed test in various Monte Carlo simulations and in application, to demonstrate the substantial gain from the improved method.
<Asymptotically Valid Two-sample Permutation Tests under Dependence>, David Kim, EunYi Chung, JiHyung Lee
Under i.i.d. sampling, two-sample permutation tests yield exactly level 𝛼 tests under the null hypothesis of equal distributions. However, when comparing means or quantiles of two independent samples, permutation tests are no longer level 𝛼 in general. Moreover, when serial dependence is presented for two time series data, permutation tests need not be asymptotically level 𝛼 again. It may also suffer sizable Type 3 errors, where a two-sided test rejects the null and concludes that the wrong direction of effect. Under mild moment and mixing conditions, we propose a testing procedure in which the asymptotic validity of the permutation test holds, while retaining the exact rejection probability 𝛼 in finite samples under the hypothesis of identical distributions and the observations are i.i.d.. A Monte Carlo simulation study is performed to compare the permutation tests to the unstudentized permutation test, standard normal approximation, and studentized bootstrap. We further provide an empirical study of comparing series of inflation expectations surveys.
<Powerful Structured Testing for Block-specific Effect Detection in Block-randomized Experiments>, Jake Bowers, David Kim, Nuole Chen
In policy evaluations done via multi-site or block-randomized experiments with many sites or blocks, the decision maker may ask to know the `location' or site within which the new policy showed detectable effects. When an experiment has many experimental blocks this substantive question about detecting effects in particular places leads to a statistical question about testing many hypotheses. In this paper, we present a method that tests nested hypotheses in a structured order that responds to the policy makers' desire to do many hypothesis tests, which controls the family-wise error rate in both weak and strong senses, and which provides more statistical power than common Bonferroni or FDR style adjustments. When we compare this approach to methods involving a test in each experimental block followed by Bonferroni-style or even FDR-style p-value adjustment we find that the structured approach detects more true effects, showing more statistical power. This approach can be thought of as addressing a question that is associated with the literature on heterogeneous treatment effects but not identical to it: rather than estimate a relationship between covariates and individual treatment effects and then testing hypotheses about the parameters defining the magnitude and/or functional form of that relationship, we focus only on testing the hypothesis of no effects in pre-specified groups of units - within the blocks or sites that arise from the design of the experiment. We present a proof of the weak control of the FWER and sketch a proof for strong control along with some conjectures. We use simulation studies to learn about how our proposal operates. We apply the method to 25 block-randomized control trials of an education policy aiming to help community college students receive a college degree.