Fall 2022 complete list with abstracts
Tuesday, December 6, 2022: Jose Zubizarreta (Harvard University)
- Title: Bridging Matching, Regression, and Weighting as Mathematical Programs for Causal Inference
- Discussant: Mike Baiocchi (Stanford University)
- Abstract: A fundamental principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Across the health and social sciences, statistical methods for covariate adjustment are used in pursuit of this principle. Basic methods are matching, regression, and weighting. In this talk, we will examine the connections between these methods through their underlying mathematical programs. We will study their strengths and weaknesses in terms of study design, computational tractability, and statistical efficiency. Overall, we will discuss the role of mathematical optimization for the design and analysis of studies of causal effects.
[Video]Tuesday, November 29, 2022: Wayne Lam (CMU)
- Title: Greedy Relaxations of the Sparsest Permutation Algorithm
- Discussant: Alex Markham (KTH Royal Institute of Technology)
- Abstract: There has been an increasing interest in methods that exploit permutation reasoning to search for directed acyclic causal models, including the "Ordering Search" of Teyssier and Kohler (2005), and GSP of Solus, Wang and Uhler (2021). We extend the methods of the latter by a permutation-based operation, tuck, and develop a class of algorithms, namely GRaSP, that are efficient and pointwise consistent under increasingly weaker assumptions than faithfulness. The most relaxed form of GRaSP outperforms many state-of-the-art causal search algorithms in simulation, allowing efficient and accurate search even for dense graphs and graphs with more than 100 variables.
[Video] [Slides] [Discussant slides] [Project]Tuesday, November 22, 2022: Julie Josse (Inria)
- Title: Causal inference for brain trauma: leveraging incomplete observational data and RCT
- Discussant: Elizabeth Stuart (Johns Hopkins University)
- Abstract: The simultaneous availability of observational and experimental data for the same medical question about the effect of a treatment is an opportunity to combine their strengths and address their weaknesses. In this presentation, I will illustrate the methodological challenges we faced in answering a medical question about the effect of tranexamic acid administration on mortality in patients with traumatic brain injury in the context of critical care management. First, we had access to a large French observational registry on severely traumatized patients, but almost all variables were incomplete. We considered different sets of hypotheses under which causal inference is possible despite the missing attributes and discussed corresponding approaches to estimating the average treatment effect, including generalized propensity score methods and multiple imputation. Second, results from an international RCT were published that did not necessarily agree with those obtained from the observational study. This led us to investigate generalization problems where the trial data are considered a biased sample of a target population and we want to predict the treatment effect on the target population represented by the observational data. We focus on the Inverse Propensity of Sampling Weighting (IPSW) estimator and establish finite-sample and asymptotic results on different versions of this estimator. In addition, we have studied how including covariates that are unnecessary for identifiability can have an impact on the asymptotic variance. Finally, I will quickly mention solutions for dealing with sporadic missing values in both data sources in this generalization framework and systematic missing values when a variable is not available in one or both data sources.
[Video] [Slides] [Paper #1, #2, #3]Tuesday, November 15, 2022: Karthik Rajkumar (LinkedIn)
- Title: A causal test of the strength of weak ties
- Discussant: Dean Eckles (MIT)
- Abstract: We analyzed data from multiple large-scale randomized experiments on LinkedIn’s People You May Know algorithm, which recommends new connections to LinkedIn members, to test the extent to which weak ties increased job mobility in the world’s largest professional social network. The experiments randomly varied the prevalence of weak ties in the networks of over 20 million people over a 5-year period, during which 2 billion new ties and 600,000 new jobs were created. The results provided experimental causal evidence supporting the strength of weak ties and suggested three revisions to the theory. First, the strength of weak ties was nonlinear. Statistical analysis found an inverted U-shaped relationship between tie strength and job transmission such that weaker ties increased job transmission but only to a point, after which there were diminishing marginal returns to tie weakness. Second, weak ties measured by interaction intensity and the number of mutual connections displayed varying effects. Moderately weak ties (measured by mutual connections) and the weakest ties (measured by interaction intensity) created the most job mobility. Third, the strength of weak ties varied by industry. Whereas weak ties increased job mobility in more digital industries, strong ties increased job mobility in less digital industries.
[Video] [Paper] [Slides] [Discussant slides]Tuesday, November 8, 2022: Luke Miratrix (Harvard University)
- Title: A devil’s bargain? Repairing a Difference in Differences parallel trends assumption with an initial matching step
- Discussant: Laura Hatfield (Harvard University)
- Abstract: The Difference in Difference (DiD) estimator is a popular estimator built on the "parallel trends" assumption that the treatment group, absent treatment, would change "similarly" to the control group over time. To increase the plausibility of this assumption, a natural idea is to match treated and control units prior to a DiD analysis. In this paper, we characterize the bias of matching under a class of linear structural models with both observed and unobserved confounders that have time varying effects. Given this framework, we find that matching on baseline covariates generally reduces the bias associated with these covariates, when compared to the original DiD estimator. We further find that additionally matching on pre-treatment outcomes has both cost and benefit. First, matching on pre-treatment outcomes will partially balance unobserved confounders, which mitigates some bias. This reduction is proportional to the outcome's reliability, a measure of how coupled the outcomes are with the latent covariates. On the other hand, we find that matching on pre-treatment outcomes also undermines the second "difference" in a DiD estimate by forcing the treated and control group's pre-treatment outcomes to be equal. This injects bias into the final estimate, creating a bias-bias tradeoff. We extend our bias results to multivariate confounders with multiple pre-treatment periods and find similar results. We summarize our findings with heuristic guidelines on whether to match prior to a DiD analysis, along with a method for roughly estimating the reduction in bias. We illustrate our guidelines by reanalyzing a recent empirical study that used matching prior to a DiD analysis to explore the impact of principal turnover on student achievement.
[Video] [Slides] [Discussant slides]Tuesday, November 1, 2022: Kun Zhang (CMU)
- Title: Methodological advances in causal representation learning
- Discussant: Victor Veitch (University of Chicago)
- Abstract: Causal representation learning aims to reveal the underlying high-level hidden causal variables and their relations. It can be seen as a special case of causal discovery, whose goal is to recover the underlying causal structure or causal model from observational data. The modularity property of a causal system implies properties of minimal changes and independent changes of causal representations, and how such properties make it possible to recover the underlying causal representations from observational data with identifiability guarantees: under appropriate assumptions, the learned representations are consistent with the underlying causal process. The talk will consider various problem settings involving independent and identically distributed (i.i.d.) data, temporal data, or data with distribution shift as input, and demonstrate when identifiable causal representation learning can benefit from the flexibility of deep learning and when it has to impose suitable parametric assumptions on the causal process.
[Video] [Slides]Tuesday, October 25, 2022: Rahul Singh (MIT) & Jiaqi Zhang (MIT)
- Talk 1 Title: Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy
- Talk 1 Abstract: The 2020 US Census will be published with differential privacy, implemented by injecting synthetic noise into the data. Controversy has ensued, with debates that center on the painful trade-off between the privacy of respondents and the precision of economic analysis. Is this trade-off inevitable? To answer this question, we formulate a semiparametric model of causal inference with high dimensional data that may be noisy, missing, discretized, or privatized. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of Gaussian approximation is n−1/2 for semiparametric estimands such as average treatment effect, and it degrades gracefully for nonparametric estimands such as heterogeneous treatment effect. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and validate in the Census. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations. Finally, we conduct a semi-synthetic exercise calibrated to privacy levels mandated for the 2020 US Census.
[Video] [Slides]
- Talk 2 Title: Active Learning for Optimal Intervention Design in Causal Models
- Talk 2 Abstract: Sequential experimental design to discover interventions that achieve a desired outcome is a key problem across disciplines. We formulate a theoretically grounded strategy that uses the samples obtained so far from different interventions to update the belief about the underlying causal model, as well as to identify samples that are most informative about optimal interventions and thus should be acquired in the next batch. The inclusion of causality allows for the identification of optimal interventions with significantly fewer but carefully selected samples. This is particularly critical when the ability to acquire interventional data is limited due to cost or ethical considerations. To demonstrate the computation and sample efficiency, we apply our approach to a perturbational single-cell transcriptomic dataset, where significant improvements over baselines are observed. The complexity of the single-cell dataset showcases the applicability of our method to real-world problems where data could be sparse and highly noisy.
[Video] [Slides]Tuesday, October 18, 2022: Biwei Huang (UC San Diego)
- Title: Latent Hierarchical Causal Structure Discovery with Rank Constraints
- Discussant: Erich Kummerfeld (University of Minnesota)
- Abstract: Most causal discovery procedures assume that there are no latent confounders in the system, which is often violated in real-world problems. In this talk, we consider a challenging scenario for causal structure identification, where some variables are latent and they form a hierarchical graph structure to generate the measured variables; the children of latent variables may still be latent and only leaf nodes are measured, and moreover, there can be multiple paths between every pair of variables (i.e., it is beyond tree structure). We propose an estimation procedure that can efficiently locate latent variables, determine their cardinalities, and identify the latent hierarchical structure, by leveraging rank deficiency constraints over the measured variables. We show that the proposed algorithm can find the correct Markov equivalence class of the whole graph asymptotically under proper restrictions on the graph structure.
[Video] [Slides] [Discussion slides] [Paper]Tuesday, October 11, 2022: Fan Li (Duke University)
- Title: A tutorial on Bayesian causal inference
- Abstract: This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework. We review the causal estimands, identification assumptions, and general structure of Bayesian inference of causal effects. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, definition of identifiability, and choice of priors in both low and high dimensional regimes. We point out the central role of covariate overlap and more generally the design stage in Bayesian causal inference. We extend the discussion to two complex assignment mechanisms: instrumental variable and time-varying treatments. Throughout, we illustrate the key concepts via examples.
[Video] [Slides]Tuesday, October 4, 2022: Lihua Lei (Stanford University)
- Title: Double-Robust Two-Way-Fixed-Effects Regression For Panel Data
- Discussant: Jeffrey Wooldridge (Michigan State University)
- Abstract: We propose a new estimator for the average causal effects of a binary treatment with panel data in settings with general treatment patterns. Our approach augments the two-way-fixed-effects specification with the unit-specific weights that arise from a model for the assignment mechanism. We show how to construct these weights in various settings, including situations where units opt into the treatment sequentially. The resulting estimator converges to an average (over units and time) treatment effect under the correct specification of the assignment model. We show that our estimator is more robust than the conventional two-way estimator: it remains consistent if either the assignment mechanism or the two-way regression model is correctly specified and performs better than the two-way-fixed-effect estimator if both are locally misspecified. This strong double robustness property quantifies the benefits from modeling the assignment process and motivates using our estimator in practice.
[Video] [Slides] [Discussant slides] [Paper]Tuesday, September 27, 2022: Vasilis Syrgkanis (Stanford University)
- Title: Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals
- Discussant: Eric Tchetgen Tchetgen (University of Pennsylvania)
- Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models.
[Video] [Slides] [Paper]Tuesday, September 20, 2022: Dominik Janzing (Amazon Research)
- Formal framework for quantitative Root Cause Analysis
- Discussant: Niklas Pfister (University of Copenhagen)
- Abstract: Asking for the “root cause(s)” of a singular event is at the heart of human attempts to understand what happened. Nevertheless, we were not able to find a satisfactory formalization of “Root Cause Analysis (RCA)” for our business. We have therefore proposed a framework for RCA of anomalies [1] and distribution change [2] that looks sufficiently general for a wide range of use cases based on structural equation models or graphical causal models. It quantifies in percentage to what extent each node contributes to the respective event, based on well-defined statistical and causal principles (also available as open source code in DoWhy [3]). I will, however, also mention cases where our implicit notion of root causes RCA also seems to be rely on normative expectations rather than on statistics and causality alone.
[Video] [Slides] [Discussant slides] [Paper #1, #2, #3]