Winter 2023 complete list with abstracts
Tuesday, March 28, 2023: Robin Evans (University of Oxford)
Title: Parameterizing and Simulating from Causal Models
Discussant: Larry Wasserman (CMU)
Abstract: Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way.
We introduce the frugal parameterization, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest.
I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa Didelez.
[Video] [Slides] [Discussant slides]Tuesday, March 21, 2023: Jessica Young (Harvard University)
- Title: Causal inference with competing events
- Discussant: Jacqueline Rudolph (Johns Hopkins University), Q&A moderator: Mats Stensrud (EPFL)
- Abstract: A competing (risk) event is any event that makes it impossible for the event of interest in a study to occur. For example, cardiovascular disease death is a competing event for prostate cancer death because an individual cannot die of prostate cancer once he has died of cardiovascular disease. Various statistical estimands have been posed in the classical competing risks literature, most prominently the cause-specific cumulative incidence, the marginal cumulative incidence, the cause-specific hazard, and the subdistribution hazard. Here we will discuss the interpretation of counterfactual contrasts in each of these estimands under different treatments and consider possible limitations in their interpretation when a causal treatment effect on the event of interest is the goal and treatment may affect future event processes. In turn, we argue that choosing a target causal effect in this setting fundamentally boils down to whether or not we choose to be satisfied estimating total effects, that capture all mechanisms by which treatment affects the event of interest, including via effects on competing events. When we deem the total effect insufficient to answer our underlying question, we consider alternative targets of inference that capture treatment mechanism for competing event settings, with emphasis on the recently proposed separable effects.
[Video] [Slides] [Discussant slides]Tuesday, March 14, 2023: Student talks
- Student speaker 1: Melody Huang (UC Berkeley)
Title: Variance-based sensitivity analysis for weighting estimators results in more informative bounds
Abstract: Weighting methods are popular tools for estimating causal effects; assessing their robustness under unobserved confounding is important in practice. In the following paper, we introduce a new set of sensitivity models called the "variance-based sensitivity model". The variance-based sensitivity model characterizes the bias from omitting a confounder by bounding distributional differences that arise in the weights from omitting a confounder, with several notable innovations over existing approaches. First, the variance-based sensitivity model can be parameterized by an R^2 parameter that is both standardized and bounded. We introduce a formal benchmarking procedure that allows researchers to use observed covariates to reason about plausible parameter values in an interpretable and transparent way. Second, we show that researchers can estimate valid confidence intervals under the variance-based sensitivity model, and provide extensions for incorporating substantive knowledge about the confounder to help tighten the intervals. Last, we demonstrate, both empirically and theoretically, that the variance-based sensitivity model can provide improvements on both the stability and tightness of the estimated confidence intervals over existing methods. We illustrate our proposed approach on a study examining the drivers of support in the 2016 FARC peace agreement.
[Slides]
- Student speaker 2: Tobias Freidling (University of Cambridge)
Title: Sensitivity Analysis with the R^2-calculus
Abstract: Causal inference necessarily relies upon untestable identification assumptions; hence, it is crucial to assess the robustness of obtained results to potential violations. However, such sensitivity analysis is only occasionally undertaken in practice as many existing methods only apply to relatively simple models and their results are often difficult to interpret. We take a more flexible approach to sensitivity analysis and view it as a constrained stochastic optimization problem. This work focuses on linear models with an unmeasured confounder and a potential instrument. In this setting, the R^2-calculus – a set of algebraic rules that relates different (partial) R^2-values and correlations – emerges as the key tool for sensitivity analysis. It can be applied to identify the bias of the family of k-class estimators, which includes the OLS and TSLS estimators, as well as construct sensitivity models flexibly. For instance, practitioners can specify their assumptions on the unmeasured confounder by comparing its influence on treatment/outcome with an observed variable. We further address the problem of constructing sensitivity intervals which generalize the concept of confidence intervals for partially identified models. Since the heuristic "plug-in" sensitivity interval may not have any confidence guarantees, this work instead follows a bootstrap approach. We illustrate the proposed methods with a real data example and provide user-friendly visualization tools.
[Slides] [Video]Tuesday, March 7, 2023: Sofia Triantafyllou (University of Crete)
- Title: A Bayesian Method for Causal Effect Estimation with Observational and Experimental Data
- Discussant: Shu Yang (NC State University)
- Abstract: Decision making is often about selecting the intervention that will maximize an outcome. In healthcare settings, for example, for each patient the goal is to select the treatment that will optimize the patient’s clinical outcome. Experimental data from randomized controlled trials allow for unbiased estimation of post-intervention outcome probabilities but are usually limited in the number of samples and the set of measured covariates. Observational data, such as electronic medical records, contain many more samples and a richer set of measured covariates, which we can use to estimate more personalized treatment effects; however, these estimates may be biased due to latent confounding. In this talk, we describe a Bayesian method that uses causal graphical models to combine observational and experimental data to improve causal effect estimation, when possible. In particular, we discuss how to select optimal feature sets in order to combine experimental and observational data to predict a post-intervention outcome, when observational data allow for unbiased estimation of that outcome.
[Video] [Slides] [Discussant slides]Tuesday, February 28, 2023: Christina Yu (Cornell University)
- Title: Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design
- Discussant: Chencheng Cai (Temple University & Harvard University)
- Q&A moderator: Mayleen Cortez and Matt Eichhorn (Cornell University)
- Abstract: Network interference, where the outcome of an individual is affected by the treatment assignment of those in their social network, is pervasive in many real-world settings. However, it poses a challenge to estimating causal effects. We consider the task of estimating the total treatment effect (TTE), or the difference between the average outcomes of the population when everyone is treated versus when no one is, under network interference. Under a Bernoulli randomized design, we utilize knowledge of the network structure to provide an unbiased estimator for the TTE when network interference effects are constrained to low order interactions among neighbors of an individual. We make no assumptions on the graph other than bounded degree, allowing for well-connected networks that may not be easily clustered.
We derive a bound on the variance of our estimator and show in simulated experiments that it performs well compared with standard estimators for the TTE. We also derive a minimax lower bound on the mean squared error of our estimator which suggests that the difficulty of estimation can be characterized by the degree of interactions in the potential outcomes model. We also prove that our estimator is asymptotically normal under boundedness conditions on the network degree and potential outcomes model. Central to our contribution is a new framework for balancing between model flexibility and statistical complexity as captured by this low order interactions structure.
[Video] [Slides] [Discussant slides]Tuesday, February 21, 2023: Ingeborg Waernbaum (Uppsala University)
- Title: Selection bias and multiple inclusion criteria in observational studies
- Discussant: Maya Mathur (Stanford University), Q&A moderator: Stina Zetterström (Uppsala University)
- Abstract: Selection bias can be a result of applying inclusion/exclusion criteria when selecting the study population in an observational study. The bias can threaten the validity of the study and sensitivity analysis for assessing the effect of the selection is desired. Bounds for selection bias based on values of sensitivity parameters were previously proposed by Smith and VanderWeele (SV). The sensitivity parameters describe aspects of the joint distribution of the outcome, selection and a vector of unmeasured variables, for each treatment group respectively. In this work we extend the sensitivity analysis and give additional guidance for practitioners to construct i) the sensitivity parameters for multiple selection variables, and ii) an alternative assumption free bound, producing only feasible values. In addition to a tutorial for the bounds and an R package, we derive additional properties for the SV bound. We show that the sensitivity parameters are variation independent and derive feasible regions for them. Furthermore, a sharp bound is defined as a bound where it is a priori known that the bias can be equal to the value of the bound, given the values of the selected sensitivity parameters. Conditions for the SV bound to be sharp are provided that can be check with the observed data. We illustrate both the R package and the properties of the bound with a simulated dataset that emulates a study where the effect of zika virus on microcephaly in Brazil is investigated. These are joint work with Stina Zetterström, Department of Statistics, Uppsala University.
[Video] [Slides] [Discussant slides]Tuesday, February 14, 2023: Stijn Vansteelandt (Ghent University)
- Title: Assumption-lean Causal Modeling
- Discussant: Elizabeth Ogburn (Johns Hopkins University)
- Abstract: Causal inference research has shifted from being primarily descriptive (describing the data-generating mechanism using statistical models) to being primarily prescriptive (evaluating the effects of specific interventions). The focus has thereby moved from being centered on statistical models to being centered on causal estimands. This evolution has been driven by the increasing need for practical solutions to real-world problems, such as designing effective interventions, making policy decisions, and identifying effective treatment strategies. It has brought enormous progress, not solely in terms of delivering more useful answers to the scientific questions at stake, but also in providing a more hygienic inference that targets a well-understood causal estimand. However, many causal questions are not readily translated into the effects of specific interventions, and even if they can, scientists may be reliant on help from an expert statistician to make that translation, may not find the considered interventions feasible or of immediate interest, or may find too little information in the data about the considered estimand. In this talk, I will reflect on this and argue that hygienic causal inference thinking therefore comes with a price. I will next propose a compromise solution at the intersection of descriptive and prescriptive causal inference. It borrows the flexibility of statistical modeling, while tying model parameters to causal estimands in order to ensure that we understand what is being estimated and obtain valid (data-adaptive) inference for it, even when the model is wrong. Examples on structural (nested) mean models, instrumental variables estimation, target trials, … will be used to provide insight.
[Video] [Slides] [Discussant slides] [Paper 1, 2, 3]Tuesday, February 7, 2023: Lauren Dang (UC Berkeley)
- Title: Integration of Observational and Randomized Controlled Trial Data: Approaches, Challenges, A Novel Estimator, and Application to the LEADER Cardiovascular Outcomes Trial
- Discussant: Robin Evans (University of Oxford)
- Abstract: Although the randomized controlled trial (RCT) is the gold standard for evidence generation, conducting an adequately powered RCT is not always feasible or desirable. A traditional RCT may be impracticable for very rare diseases, and excessive randomization to control may be considered unethical for severe diseases without effective treatments or for certain pediatric drug approvals. In such cases, we may wish to integrate data from a small RCT with real-world data (RWD) to increase power but at the risk of introducing bias. A growing number of “data fusion” methods seek to estimate the bias from incorporating RWD to determine whether to include RWD or how to weight the RWD in a combined analysis. This talk will use a roadmap for causal inference to explore the challenges of integrating observational and RCT data, including considerations for designing such a hybrid trial. We will discuss different approaches to data fusion, including a novel estimator that uses cross-validated targeted maximum likelihood estimation (CV-TMLE) to data-adaptively select and analyze the optimal experiment - RCT only (if no unbiased external data exists) or RCT with external data. Finally, we will discuss an example of distinguishing biased versus unbiased extra controls by region in an analysis of the effect of liraglutide on change in hemoglobin A1c from the LEADER trial.
[Video] [Paper] [Slides] [Discussant slides]Tuesday, January 31, 2023: Issa Kohler-Hausmann & Lily Hu (Yale University)
- Title: Causal mediators and misdefined causal quantities
- Abstract: A number of influential causal inference researchers have asked the following question: Can we quantify an effect of race on a decision that takes place downstream of other decisions that were themselves causally affected by race? If so, how? In this talk, we ask if causal quantities that attempt to “hold constant” or intervene on downstream mediators are misdefined. We explore this question by asking whether causal inference is—for lack of a more precise word—“messed up” when we fail to take account of non-causal relations between variables (or, more precisely, between “relata”—the things in our world represented by the variables, about which are the true object of scientific inquiry).Tuesday, January 24, 2023: Issa Kohler-Hausmann & Lily Hu (Yale University)
- Title: What is the causal effect an effect of in audit/correspondence studies?
- Abstract: Many researchers have expressed an interest in isolating and measuring the causal effect that social statues such as race or gender has on decision outcomes in various domains. In such endeavors, the effort to “isolate” (e.g.) race means disentangling the causal effect of just race on the decision of interest from the causal effect of so-called non-race factors on the decision. For example, researchers have described the quantity of interest in general terms, saying that they seek to measure the difference in treatment due to “varying race [or gender] but keeping all else constant,” or that is afforded “members of a minority group,” compared to “members of a majority group with otherwise identical characteristics in similar circumstances.” Sometimes scholars also aver that such a causal quantity corresponds to one or another conception of discrimination.
This talk explores what such endeavors might mean by examining audit or correspondence studies, which are taken as the best experimental approximators of the “all else equal” gold standard of causal inquiry. Specifically, we will uncover the assumptions that underwrite the study designs, probe different interpretations of audit study results, and ask how these different interpretations affect our conceptualization of the causal estimand that such studies study.Tuesday, January 17, 2023: Moritz Hardt (Max Planck Institute for Intelligent Systems)
- Title: From prediction to power
- Discussant: Michael P. Kim (UC Berkeley)
- Abstract: A recent formal framework, called performative prediction, draws attention to the fundamental difference between learning from a population and steering a population through predictions. Against this backdrop, we'll examine the role of prediction in questions of power in digital markets. Building on performative prediction, I'll introduce the notion of performative power that measures the ability of a firm operating an algorithmic system to benefit from steering. Traditional economic tools struggle with identifying anti-competitive patterns in digital markets not least due to the complexity of market definition. In contrast, performative power is a causal notion identifiable with little knowledge of the market, its internals, participants, products, or prices. Low performative power implies that a firm can do no better than to optimize their objective on current data. In contrast, firms of high performative power stand to benefit from steering the population towards more profitable behavior. In a simple theoretical model, we can see that monopolies maximize performative power, a firm's ability to personalize increases performative power, while competition and outside options decrease performative power. On the empirical side, I'll discuss observational causal designs to identify performative power from discontinuities in how digital platforms display content. This allows to repurpose causal effects from various studies about digital platforms as lower bounds on performative power. Finally, I'll situate performative power in the context of European competition policy and antitrust enforcement in digital marketplaces.
Based on a joint work with Meena Jagadeesan and Celestine Mendler-Dünner, as well as ongoing work with Gabriele Carovano and Celestine Mendler-Dünner.
[Video] [Slides]Tuesday, January 10, 2023: Ilya Shpitser (Johns Hopkins University)
- Title: Fairness By Causal Mediation Analysis: Criteria, Algorithms, and Open Problems
- Discussant: Ricardo Silva (University College London)
- Abstract: Systematic discriminatory biases present in our society influence the way data is collected and stored, the way variables are defined, and the way scientific findings are put into practice as policy. Automated decision procedures and learning algorithms applied to such data may serve to perpetuate existing injustice or unfairness in our society.
We consider how to solve prediction and policy learning problems in a way which ``breaks the cycle of injustice'' by correcting for the unfair dependence of outcomes, decisions, or both, on sensitive features (e.g., variables that correspond to gender, race, disability, or other protected attributes). We use methods from causal inference and constrained optimization to learn outcome predictors and optimal policies in a way that addresses multiple potential biases which afflict data analysis in sensitive contexts.
Our proposal comes equipped with the guarantee that solving prediction or decision problems on new instances will result in a joint distribution where the given fairness constraint is satisfied. We illustrate our approach with both synthetic data and real criminal justice data.
[Video] [Slides] [Discussant slides]