See our homepage for future talks.
Tuesday, December 10, 2024: Panos Toulis and Wenxuan Guo (University of Chicago)
- Title: ML-assisted Randomization Tests for Complex Treatment Effects in A/B Experiments
- Discussant: Xinran Li (University of Chicago)
- Abstract: Experimentation is widely used for causal inference and data-driven decision making across disciplines. In an A/B experiment, for example, a business randomizes two different treatments (e.g., website designs) to their customers and then aims to infer which treatment is better. In this paper, we construct randomization tests for complex treatment effects, including heterogeneity and interference. A key feature of our approach is the use of flexible machine learning (ML) models, where the ANOVA-like test statistic is defined as the difference between the cross-validation errors from two ML models, one including the treatment variable and the other without it. This approach combines the predictive power of modern ML tools with the finite-sample validity of the randomization framework, enabling a robust and efficient way to perform causal inference in experimental settings. We demonstrate this combined benefit both theoretically and empirically through applied examples.
[Slides][Video][Discussant slides]
Tuesday, December 03, 2024: Yiqing Xu (Stanford University)
- Title: Factorial Difference-in-Differences
- Discussant: Erin Hartman (University of California Berkeley)
- Abstract: In many social science applications, researchers use the difference-in-differences (DID) estimator to establish causal relationships, exploiting cross-sectional variation in a baseline factor and temporal variation in exposure to an event that presumably may affect all units. This approach, which we term factorial DID (FDID), differs from canonical DID in that it lacks a clean control group unexposed to the event after the event occurs. In this paper, we clarify FDID as a research design in terms of its data structure, feasible estimands, and identifying assumptions that allow the DID estimator to recover these estimands. We frame FDID as a factorial design with two factors: the baseline factor, denoted by G, and the exposure level to the event, denoted by Z, and define the effect modification and causal interaction as the associative and causal effects of G on the effect of Z, respectively. We show that under the canonical no anticipation and parallel trends assumptions, the DID estimator identifies only the effect modification of G in FDID, and propose an additional factorial parallel trends assumption to identify the causal interaction. Moreover, we show that the canonical DID research design can be reframed as a special case of the FDID research design with an additional exclusion restriction assumption, thereby reconciling the two approaches. We extend this framework to allow conditionally valid parallel trends assumptions and multiple time periods, and clarify assumptions required to justify regression analysis under FDID. We illustrate these findings with empirical examples from economics and political science, and provide recommendations for improving practice and interpretation under FDID.
This is joint work with Anqi Zhao and Peng Ding.
[Slides][Paper][Video][Discussant slides]
Tuesday, November 19, 2024: Jared S. Murray (University of Texas at Austin)
- Title: A Unifying Weighting Perspective on Causal Machine Learning: Kernel Methods, Gaussian Processes, and Bayesian Tree Models
- Discussant: Rahul Singh (Harvard University)
- Abstract: Causal machine learning methods based on kernel methods are powerful tools for estimating heterogeneous treatment effects; examples include kernel ridge regression, (causal) random forests, and many neural networks. A known but underappreciated result is that many of these methods have an equivalent representation as weighting estimators, with weights that correspond to an estimate of the Riesz representer of the estimand. This paper catalogs results about the weighting representation of heterogenous effect estimates under kernel ridge regression estimates of outcome models, and we provide new results about kernel ridge estimates that incorporate propensity scores in the spirit of the Robinson ``regression-on-residuals'' transformation.
We show that under mild conditions, these R-parameterized outcome models produce implied weights that approximately balance broad classes of functions between treated and control groups when estimating the average treatment effect in {\em any} target population -- even under some forms of outcome model misspecification. This result connects the desirable properties of the Robinson transform and its corresponding Neyman orthogonal score/risk functions to the balancing properties of the implied weights.
We also show that this balancing property is generally insufficient to completely debias estimates even if the outcome model is correctly specified. We characterize the remaining bias via ``target imbalance'': the difference between means in the model-implied target population and the actual target population. We propose broadly applicable debiasing strategies that remain inside the outcome modeling framework. Finally, we show that many of these results translate to the R-learner with linear smoothers, which is also a weighting estimator.
We then extend our results to a large class of Bayesian nonparametric models used in causal machine learning via their representation as conditional Gaussian process (GP) regression models. Examples include BART, Bayesian causal forests, high-dimensional Bayesian regression with shrinkage or selection priors, many Bayesian neural networks, and other generic GP regression models. We use the connection between GP and kernel ridge regression to compute and interpret the model-implied weights. Their form sheds new light on why Bayesian tree models are especially effective for estimating heterogeneous effects. Finally, we leverage the implied weighting representation to introduce new tools for diagnosing violations of causal assumptions, model criticism, and method comparisons.
[Video]
Tuesday, November 12, 2024: Tianchen Qian (University of California Irvine)
- Title: Causal inference and machine learning in mobile health – modeling time-varying effects using longitudinal functional data
- Discussant: Walter Dempsey (University of Michigan)
- Abstract: To optimize mobile health interventions and advance domain knowledge on intervention design, it is critical to understand how the intervention effect varies over time and with contextual information. This study aims to assess how a push notification suggesting physical activity influences individuals’ step counts using data from the HeartSteps micro-randomized trial (MRT). The statistical challenges include the time-varying treatments and longitudinal functional step count measurements. We propose the first semiparametric causal excursion effect model with varying coefficients to model the time-varying effects within a decision point and across decision points in an MRT. The proposed model incorporates double time indices to accommodate the longitudinal functional outcome, enabling the assessment of time-varying effect moderation by contextual variables. We propose a two-stage causal effect estimator that is robust against a misspecified high-dimensional outcome regression nuisance model. We establish asymptotic theory and conduct simulation studies to validate the proposed estimator. Our analysis provides new insights into individuals’ change in response profiles (such as how soon a response occurs) due to the activity suggestions, how such changes differ by the type of suggestions received, and how such changes depend on other contextual information such as being recently sedentary and the day being a weekday.
[Slides][Video]
Tuesday, November 5, 2024 (Young researcher seminar)
Speaker 1: Jinzhou Li (Stanford University)
- Title: Root cause discovery via permutations and Cholesky decomposition
- Abstract: This work is motivated by the following problem: Can we identify the disease-causing gene in a patient affected by a monogenic disorder? This problem is an instance of root cause discovery. In particular, we aim to identify the intervened variable in one interventional sample using a set of observational samples as reference. We consider a linear structural equation model where the causal ordering is unknown. We begin by examining a simple method that uses squared z-scores and characterize the conditions under which this method succeeds and fails, showing that it generally cannot identify the root cause. We then prove, without additional assumptions, that the root cause is identifiable even if the causal ordering is not. Two key ingredients of this identifiability result are the use of permutations and the Cholesky decomposition, which allow us to exploit an invariant property across different permutations to discover the root cause. Furthermore, we characterize permutations that yield the correct root cause and, based on this, propose a valid method for root cause discovery. We also adapt this approach to high-dimensional settings. Finally, we evaluate the performance of our methods through simulations and apply the high-dimensional method to discover disease-causing genes in the gene expression dataset that motivates this work.
[Paper][Slides][Video]
Speaker 2: Yuyao Wang (University of California San Diego)
- Title: Learning treatment effects under covariate dependent left truncation and right censoring
- Abstract: In observational studies with delayed entry, causal inference for time-to-event outcomes can be challenging. The challenges arise because, in addition to the potential confounding bias from observational data, the collected data often also suffers from the selection bias due to left truncation, where only subjects with time to event (such as death) greater than the enrollment times are included, as well as bias from informative right censoring. To estimate the treatment effects on time-to-event outcomes in such settings, inverse probability weighting (IPW) is often employed. However, IPW is sensitive to model misspecifications, which makes it vulnerable, especially when faced with three sources of biases. Moreover, IPW is inefficient. To address these challenges, we propose a doubly robust framework to handle covariate dependent left truncation and right censoring that can be applied to a wide range of estimation problems, including estimating average treatment effect (ATE) and conditional average treatment effect (CATE). For average treatment effect, we develop estimators that enjoy model double robustness and rate double robustness. For conditional average treatment effect, we develop orthogonal and doubly robust learners that can achieve oracle rate of convergence. Our framework represents the first attempt to construct doubly robust estimators and orthogonal learners for treatment effects that account for all three sources of biases: confounding, selection from covariate-induced dependent left truncation, and informative right censoring.
[Slides][Video]
Tuesday, October 29, 2024: Toru Kitagawa (Brown University)
- Title: Policy Choice in Time-Series by Empirical Welfare Maximization
- Discussant: Mikkel Plagborg-Moller (Princeton University)
- Abstract: This paper develops a novel method for policy choice in a dynamic setting where the available data is a multivariate time series. Building on the statistical treatment choice framework, we propose Time-series Empirical Welfare Maximization (T-EWM) methods to estimate an optimal policy rule for the current period or over multiple periods by maximizing an empirical welfare criterion constructed using nonparametric potential outcome time-series. We characterize conditions under which T-EWM consistently learns a policy choice that is optimal in terms of conditional welfare given the time-series history. We then derive a nonasymptotic upper bound for conditional welfare regret and its minimax lower bound. To illustrate the implementation and uses of T-EWM, we perform simulation studies and apply the method to estimate optimal monetary policy rules from macroeconomic time-series data.
[Paper][Video][Slides][Discussant slides]
Tuesday, October 22, 2024: Alexis Bellot (Google DeepMind, London)
- Title: Partial Transportability for Domain Generalization
- Discussant: Adam Li (Columbia University)
- Abstract: A fundamental task in AI is providing performance guarantees for predictions made in unseen domains. In practice, there can be substantial uncertainty about the distribution of new data, and corresponding variability in the performance of existing predictors. For example, a risk prediction tool fine-tuned on a patient population (e.g. particular hospital, geographic location) may not be equally optimal if deployed on a different patient population that may differ in several aspects. This talk studies this problem through the lens of partial transportability, that combines data from source domains and assumptions about the data generating mechanisms, encoded in causal diagrams, to provide a guarantee on out-of-distribution performance of classification models. We will show that one may consistently predict the worst-case performance of existing classification models, and that, further, one may train classification models to explicitly optimize for worst-case performance in a target domain, under our assumptions. Both these methods may be parameterized with expressive neural networks and implemented with gradient-based optimization schemes. With these results, we hope to provide a fresh perspective on the problem of transfer learning and domain generalization in machine learning.
[Paper][Video][Slides][Discussant slides]
Tuesday, October 15, 2024: Oliver Dukes (Ghent University)
- Title: Nonparametric tests of treatment effect homogeneity for policy-makers
- Discussant: Edward Kennedy (Carnegie Mellon University)
- Abstract: Recent work has focused on nonparametric estimation of conditional treatment effects, but inference has remained relatively unexplored. We propose a class of nonparametric tests for both quantitative and qualitative treatment effect heterogeneity. The tests can incorporate a variety of structured assumptions on the conditional average treatment effect, allow for both continuous and discrete covariates, and do not require sample splitting. Furthermore, we show how the tests are tailored to detect alternatives where the population impact of adopting a personalized decision rule differs from using a rule that discards covariates. The proposal is thus relevant for guiding treatment policies. The utility of the proposal is borne out in simulation studies and a re-analysis of an AIDS clinical trial. This is joint work with Mats Stensrud, Riccardo Brioschi and Aaron Hudson.
[Paper][Video][Slides][Discussion slides]
Tuesday, October 8, 2024 (Young researcher seminar)
Speaker 1: Philipp Faller (Karlsruhe Institute for Technology)
- Title: Self-compatibility: Evaluating causal discovery without ground truth
- Abstract: As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect preconceptions about generating processes regarding noise distributions, model classes, and more. In this talk, I present a method for falsifying the output of a causal discovery algorithm in the absence of ground truth. The key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. Detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, I will argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution.
[Paper][Video][Slides]
Speaker 2: Bijan Mazaheri (Broad Institute of MIT and Harvard)
- Title: Synthetic Potential Outcomes and the Hierarchy of Causal Identifiability
- Abstract: A mixture model consists of a latent class that exerts a discrete signal on the observed data. Uncovering these latent classes is fundamental to unsupervised learning and forms the backbone of scientific thought. In this talk, we consider the problem of recovering latent classes of causal responses to an intervention. We allow overlapping support in the distributions of these classes, meaning individuals cannot be clustered into groups with a similar response. Instead, we develop a method of moments approach to synthetically sample potential outcome distributions using the higher-order multi-linear moments of the observable data. This approach is the first known identifiability result for what we call Mixtures of Treatment Effects (MTEs). More broadly, we show how MTEs fit into a hierarchy of mixture identifiability that unifies a number of previous approaches to latent class confounding.
[Paper][Video][Slides]
Tuesday, October 1, 2024: Anish Agarwal (Columbia University)
- Title: Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions
- Discussant: Christina Lee Yu (Cornell University)
- Abstract: We consider a setting where there are N heterogeneous units and p interventions. Our goal is to learn unit-specific potential outcomes for any combination of these p interventions, i.e., N×2^p causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments and recommendation engines (e.g., showing a set of movies that maximizes engagement for a given user). Running N×2^p experiments to estimate the various parameters is likely expensive and/or infeasible as N and p grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. We study this problem under a novel model that imposes latent structure across both units and combinations of interventions. Specifically, we assume latent similarity in potential outcomes across units (i.e., the matrix of potential outcomes is approximately rank r) and regularity in how combinations of interventions interact (i.e., the coefficients in the Fourier expansion of the potential outcomes is approximately s sparse). We establish identification for all N×2^p parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish finite-sample consistency under precise conditions on the observation pattern. We show that Synthetic Combinations is able to consistently estimate unit-specific potential outcomes given a total of poly(r)×(N+s^{2}p) observations. In comparison, previous methods that do not exploit structure across both units and combinations have poorer sample complexity scaling as min(N×s^{2}p, r×(N+2^p)).
[Paper][Video] [Slides] [Discussion slides]