Tuesday, April 2, 2024: Kosuke Imai (Harvard University)
- Title: The Cram Method for Efficient Simultaneous Learning and Evaluation
- Abstract: We introduce the `cram' method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. The cram method also naturally accommodates online learning algorithms, making its implementation computationally efficient. To demonstrate the power of the cram method, we consider the standard policy learning setting where cramming is applied to the same data to both develop an individualized treatment rule (ITR) and estimate the average outcome that would result if the learned ITR were to be deployed. We show that under a minimal set of assumptions, the resulting crammed evaluation estimator is consistent and asymptotically normal. While our asymptotic results require a relatively weak stabilization condition of ML algorithm, we develop a simple, generic method that can be used with any policy learning algorithm to satisfy this condition. Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40\% while improving the performance of learned policy. We also apply the cram method to a randomized clinical trial to demonstrate its applicability to real-world problems. Finally, we briefly discuss future extensions of the cram method to other learning and evaluation settings.
- Discussant: Rui Song (North Carolina State University) and Hengrui Cai (UC Irvine)
- Q&A moderator: Michael Li (Harvard University)
[Video] [Slides] [Discussion slides]
Tuesday, April 9, 2024: Chao Ma (Microsoft Research)
- Title: Towards Causal Foundation Model: on Duality between Causal Inference and Attention
- Discussant: Jiaqi Zhang (MIT)
- Abstract: Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for complex tasks. We propose a novel, theoretically sound method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that our approach CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset causal inference methodologies.
[Video]
Tuesday, April 16, 2024: Mihaela van der Schaar (University of Cambridge)
- Title: The (Causal) Discovery Ladder: Unravelling Governing Equations and Beyond using Machine Learning
[Video] [Slides] [Related paper: #1, #2, #3, #4]
Tuesday, April 23, 2024 (Young researcher seminar)
- Speaker 1: Chan Park (University of Pennsylvania)
- Title: Single Proxy Control
- Abstract: Negative control variables are sometimes used in non-experimental studies to detect the presence of confounding by hidden factors. A negative control outcome (NCO) is an outcome that is influenced by unobserved confounders of the exposure effects on the outcome in view, but is not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO on the treatment-free counterfactual, together with a rank-preserving structural model which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose three separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly-robust approach. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil. This is a joint work with David Richardson and Eric Tchetgen Tchetgen.
[Video] [Slides]
- Speaker 2: Andrew Yiu (University of Oxford)
- Title: Semiparametric posterior corrections
- Abstract: Semiparametric inference refers to the use of infinite-dimensional models to estimate finite-dimensional statistical functionals, which has gained particular popularity for handling causal problems. In empirical studies, nonparametric Bayesian methods such as BART (Bayesian additive regression trees) have performed strongly for point estimation, but the results for uncertainty quantification are mixed. The pivotal issue is the inherent “plug-in” nature of Bayesian inference, which means that the regularization employed in estimating high-dimensional nuisance parameters can induce a bias that bleeds into the estimation of the target functional. We introduce a method that post-processes an initial Bayesian posterior to correct the uncertainty quantification. The motivation is to fully leverage the adaptivity and predictive performance of nonparametric Bayes to tackle semiparametric problems with provision of asymptotic frequentist guarantees. Our approach could be interpreted as a stochastic version of semiparametric one-step estimation - we add a correction term to each posterior sample that incorporates both the efficient influence function and the Bayesian bootstrap. We illustrate the empirical performance of our method with the ACIC 2016 data analysis competition.
[Video] [Slides]
Tuesday, April 30, 2024: Hyunseung Kang (University of Wisconsin-Madison)
- Title: Transfer Learning Between U.S. Presidential Elections: How much can we learn from a 2020 ad campaign to inform 2024 elections?
- Discussant: Melody Huang (Harvard University)
- Abstract: In the 2020 U.S presidential election, Aggarwal et al. (2023) ran a large-scale, randomized experiment to analyze the impact of an online ad campaign on voter turnout and found that the overall impact was “effectively equivalent to zero." As the 2024 election approaches, a natural question to ask is whether a similar ad campaign would remain ineffective during this election. Despite some similarities between 2020 and 2024, such as the same presumptive candidates and concerns about the economy, differences like COVID-19, concerns about immigration, and overturning of Roe v. Wade exist, which raises the broader question: how much can we learn from past ad campaigns to inform future ad campaigns?
In this ongoing work, we lay out a transfer learning framework to address this question. Two major features of our framework are that we do not assume (a) the transportability assumption, which roughly states that the differences between the 2020 and the 2024 elections can be adjusted by a common set of covariates, and (b) the same set of covariates are measured between the two elections. Instead, we present a sensitivity analysis framework that provide a plausible range of effects of future ad campaigns based on past ad campaigns and allow the covariates to be different between elections. Under our framework, we develop two nonparametric estimators, one of which is rooted in the study design, derive a bootstrap approach to conduct inference, and establish some inferential guarantees. We also present simple ways to calibrate and ultimately, demystify sensitivity parameters for interpretability. We conclude with some preliminary results about the plausible range of effects of running an ad campaign during the 2024 U.S. presidential election.
This is joint work with Xinran Miao (UW-Madison) and Jiwei Zhao (UW-Madison).
[Video] [Slides]
Tuesday, May 7, 2024: Raaz Dwivedi (Cornell University)
- Discussant: James Robins (Harvard University)
- Title: Integrating Double Robustness into Causal Latent Factor Models
- Abstract: Latent factor models are widely utilized for causal inference in panel data, involving multiple measurements across various units. Popular inference methods include matrix completion for estimating the average treatment effect (ATE) and the nearest neighbor approach for individual treatment effects (ITE). However, these methods respectively underperform with non-low-rank outcomes or when faced with diverse units in the data. To tackle these challenges, we integrate double robustness principles with factor models, introducing estimators designed to be resilient against such issues. We present a doubly robust matrix completion strategy for ATE, capable of ensuring consistency despite unobserved confounding, either with low-rank outcome matrices or propensity matrices, and providing superior error/confidence intervals when both matrices are low-rank. Next, we propose a doubly robust nearest neighbor method for ITE, designed to achieve consistent estimates in the presence of either similar units or measurements, with improved error/confidence intervals when both conditions are met.
[Video] [Slides] [Related paper #1 #2]
Tuesday, May 21, 2024 (Young researcher seminar)
- Speaker 1: Abhin Shah (MIT)
- Title: On counterfactual inference with unobserved confounding via exponential family
- Abstract: We are interested in the problem of unit-level counterfactual inference with unobserved confounders owing to the increasing importance of personalized decision-making in many domains: consider a recommender system interacting with many users over time where each user is provided recommendations based on observed demographics, prior engagement levels as well as certain unobserved factors. The system adapts its recommendations sequentially and differently for each user. Ideally, at each point in time, the system wants to infer each user's unknown engagement if it were exposed to a different sequence of recommendations while everything else remained unchanged. This task is challenging since: (a) the unobserved factors could give rise to spurious associations, (b) the users could be heterogeneous, and (c) only a single trajectory per user is available. We model the underlying joint distribution through an exponential family. This reduces the task of unit-level counterfactual inference to simultaneously learning a collection of distributions of a given exponential family with different unknown parameters with single observation per distribution. We discuss a computationally efficient method for learning all of these parameters with estimation error scaling linearly with the metric entropy of the space of unknown parameters – if the parameters are s-sparse linear combination of k known vectors in p dimension, the error scales as O(s (log k)/p).
[Video] [Related papers: #1, #2]
- Speaker 2: Brian Gilbert (New York University)
- Title: Identification and estimation of mediational effects of longitudinal modified treatment policies
- Abstract: We demonstrate a comprehensive semiparametric approach to causal mediation analysis, addressing the complexities inherent in settings with longitudinal and continuous treatments, confounders, and mediators. Our methodology utilizes a nonparametric structural equation model and a cross-fitted sequential regression technique based on doubly robust pseudo-outcomes, yielding an efficient, asymptotically normal estimator without relying on restrictive parametric modeling assumptions. We are motivated by a recent scientific controversy regarding the effects of invasive mechanical ventilation (IMV) on the survival of COVID-19 patients, considering acute kidney injury (AKI) as a mediating factor. We highlight the possibility of "inconsistent mediation," in which the direct and indirect effects of the exposure operate in opposite directions. We discuss the significance of mediation analysis for scientific understanding and its potential utility in treatment decisions.
[Video] [Related paper]
Tuesday, May 28, 2024: Rodrigo Pinto (UCLA)
- Title: What is causality? How to express it? And why it matters
- Discussant: Ilya Shpitser (Johns Hopkins University)
- Abstract: Causality is a well-studied concept in economics, yet effective causal analysis necessitates tools beyond traditional statistics and probability theory. Economists have historically employed structural equations and causal tools for this purpose. Alongside this traditional approach, several frameworks have been developed to address and manipulate causal inquiries. This paper explores the advantages and drawbacks of three prominent approaches: Haavelmo's Hypothetical Model Approach, the Language of Potential Outcomes (Rubin and co-authors), and Do-calculus (Pearl and co-authors). By critically examining these methodologies, we aim to highlight their respective strengths and limitations, providing a comprehensive understanding of their applications in economic causal analysis.
[Video] [Slides] [Discussant slides]
Tuesday, June 4, 2024: Wang Miao (Peking University)
- Title: Introducing the specificity score: a measure of causality beyond P value
- Discussant: Qingyuan Zhao (University of Cambridge)
- Abstract: There is considerable debate about P value in scientific research and its use is banished in several prestigious journals in recent years. Particularly in observational studies where confounding or selection bias arises, P value as a measure of statistical significance fails to capture the causal association of scientific interest, and could lead to false or trivial scientific discoveries. In this talk, I will introduce a specificity score for testing the existence of causal effects in the presence of unmeasured confounding. The specificity score measures how extreme the observed association is when compared to the confounding bias. A large specificity score means the observed association cannot be explained away by confounding and is thus evidence of causality. Under certain conditions, the specificity test has controlled type I error and power approaching unity for testing the null hypothesis of no causal effect. This approach only entails certain rough information on the broadness of the causal associations in sight, but does not require the availability of auxiliary variables. This approach admits joint causal discovery with multiple treatments and multiple outcomes, which is particularly suitable for gene expressions studies, Mendelian randomization and EHR studies. A visualization approach using a heatmap of specificity is used to communicate all specificity score/test information in a universal and effective manner. The specificity score is related to Hill’s specificity criterion for causal inference, but I will discuss the differences from Hill’s. Simulations are used for illustration and an application to a mouse obesity dataset detects potential active effects of genes on clinical traits that are relevant to metabolic syndrome.
[Video]