OCIS - Winter 2021 talks

Winter 2021 complete list with abstracts

Tuesday, March 23, 2021: Joshua Angrist (MIT)
Title: Simple and Credible Value-Added Estimation Using Centralized School Assignment
Discussant: Jesse Rothstein (UC Berkeley)
Abstract: Many large urban school districts match students to schools using algorithms that incorporate an element of random assignment. We introduce two simple empirical strategies to harness this randomization for value-added models (VAMs) measuring the causal effects of individual schools. The first estimator controls for the probability of being offered admission to different schools, treating the take-up decision as independent of potential outcomes. Randomness in school assignments is used to test this key conditional independence assumption. The second estimator exploits randomness in offers to generate instrumental variables (IVs) for school enrollment. This procedure uses a low-dimensional model of school quality mediators to solve the under- identification challenge arising from the fact that some schools are under-subscribed. Both approaches relax the assumptions of conventional value-added models while obviating the need for elaborate nonlinear estimators. In applications to data from Denver and New York City, we find that models controlling for both assignment risk and lagged achievement yield highly reliable VAM estimates. Estimates from models with fewer controls and older lagged score controls are improved markedly by IV.
Joint with Peter Hull, Parag Pathak, Christopher Walters
[Paper] [Slides] [Discussant slides]
Tuesday, March 16, 2021: Kun Zhang (Carnegie Mellon)
"Learning and Using Causal Representations"
Discussant: Cosma Shalizi (Carnegie Mellon)
Abstract: When do we have to make use of causal knowledge, and when does associational information suffice for machine learning? Can we find the causal direction between two variables by analyzing their observed values? Can we figure out where latent causal variables should be and how they are related? For the purpose of understanding and manipulating systems properly, people often attempt to answer such causal questions. Furthermore, we are often concerned with artificial intelligence (AI) in complex environments. For instance, how can we do transfer learning in a principled way? How can machines deal with adversarial attacks? Interestingly, it has recently been shown that causal information can facilitate understanding and solving various AI problems. This talk focused on how to learn (hidden) causal representations from observation data and why and how the causal perspective allows adaptive prediction and a potentially higher level of artificial intelligence.
[Video] [Slides] [Discussant slides]
Tuesday, March 9, 2021: Luke Keele (University of Pennsylvania)
"Hospital Quality Risk Standardization via Approximate Balancing Weights"
Discussant: Sam Pimentel (UC Berkeley)
Abstract: Comparing outcomes across hospitals, often to identify underperforming hospitals, is a critical task in health services research. However, naive comparisons of average outcomes, such as surgery complication rates, can be misleading because hospital case mixes differ — a hospital’s overall complication rate may be lower due to more effective treatments or simply because the hospital serves a healthier population overall. In this paper, we develop a method of “direct standardization” where we re-weight each hospital patient population to be representative of the overall population and then compare the weighted averages across hospitals. Adapting methods from survey sampling and causal inference, we find weights that directly control for imbalance between the hospital patient mix and the target population, even across many patient attributes. Critically, these balancing weights can also be tuned to preserve sample size for more precise estimates. We also derive principled measures of statistical precision, and use outcome modeling and Bayesian shrinkage to increase precision and account for variation in hospital size. We demonstrate these methods using claims data from Pennsylvania, Florida, and New York, estimating standardized hospital complication rates for general surgery patients. We conclude with a discussion of how to detect low performing hospitals.
[Video] [Paper] [Slides] [Discussant slides]
Tuesday, March 2, 2021: Fredrik Sävje (Yale)
"Balancing covariates in randomized experiments using the Gram-Schmidt Walk"
Discussant: Peng Ding (UC Berkeley)
Abstract: The design of experiments involves a compromise between covariate balance and robustness. This paper introduces an experimental design that admits precise control over this trade-off. The design is specified by a parameter that bounds the worst-case mean square error of an estimator of the average treatment effect. Subject to the experimenter's desired level of robustness, the design aims to simultaneously balance all linear functions of the covariates. The achieved level of balance is considerably better than what a fully random assignment would produce, and it is close to optimal given the desired level of robustness. We show that the mean square error of the estimator is bounded by the minimum of the loss function of a ridge regression of the potential outcomes on the covariates. One may thus interpret the approach as regression adjustment by design. Finally, we provide non-asymptotic tail bounds for the estimator, which facilitate the construction of conservative confidence intervals.
[Video] [Slides] [Paper] [Discussant slides]
Tuesday, February 23, 2021: Fan Li (Duke University)
"Causal Mediation Analysis for Sparse and Irregular Longitudinal Data"
Discussant: Georgia Papadogeorgou (University of Florida)
Abstract: Causal mediation analysis seeks to investigate how the treatment effect of an exposure on outcomes is mediated through intermediate variables. Although many applications involve longitudinal data, the existing methods are not directly applicable to settings where the mediator and outcome are measured on sparse and irregular time grids. We extend the existing causal mediation framework from a functional data analysis perspective, viewing the sparse and irregular longitudinal data as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. For estimation and inference, we employ a functional principal component analysis approach for dimension reduction and use the first few functional principal components instead of the whole trajectories in the structural equation models. We adopt the Bayesian paradigm to accurately quantify the uncertainties. The operating characteristics of the proposed methods are examined via simulations. We apply the proposed methods to a longitudinal data set from a wild baboon population in Kenya to investigate the causal relationships between early adversity, strength of social bonds between animals, and adult glucocorticoid hormone concentrations. I will focus on main ideas and try to avoid complex notations (common in mediation analysis) as much as I can, and will also invite discussion on the limitations and limits of current causal mediation analysis. This is a joint work with Shuxi Zeng at Duke University.
[Video] [Slides] [Discussant slides]
Tuesday, February 16, 2021: Donald Green (Columbia University)
"Using Placebo-Controlled Designs to Detect Edutainment Effects and Spillovers: Results from Two Large-Scale Experiments in Uganda"
Discussant: Molly Offer-Westort (Stanford)
Abstract: Education–entertainment refers to dramatizations designed to convey information and to change attitudes. Buoyed by observational studies suggesting that education–entertainment strongly influences beliefs, attitudes and behaviours, scholars have recently assessed education–entertainment by using rigorous experimental designs in field settings. Studies conducted in developing countries have repeatedly shown the effectiveness of radio and film dramatizations on outcomes ranging from health to group conflict. One important gap in the literature is estimation of social spillover effects from those exposed to the dramatizations to others in the audience members’ social network. In theory, the social diffusion of media effects could greatly amplify their policy impact. The current study uses a novel placebo‐controlled design that gauges both the direct effects of the treatment on audience members as well as the indirect effects of the treatment on others in their family and in the community. We implement this design in two large cluster‐randomized experiments set in rural Uganda using video dramatizations on the topics of violence against women, teacher absenteeism and abortion stigma. We find several instances of sizable and highly significant direct effects on the attitudes of audience members, but we find little evidence that these effects diffused to others in the villages where the videos were aired.
[Video] [Paper] [Slides] [Discussant slides]
Tuesday, February 9, 2021: Martin Tingley and Jeffrey Wong (Netflix)
"Supporting Innovation and Scale with a Democratized Experimentation Platform"
Discussant: Iavor Bojinov (Harvard)
Abstract: The Netflix Experimentation Platform is democratized and modular: data scientists can contribute metrics, causal inference methods, and visualizations directly to the platform, and use these building blocks to compose flexible reports that flow through to our frontend UI. This contribution model supports rapid prototyping and innovation, as data scientists contribute directly to production systems. To ensure that the platform continues to support the required scale (number and size of tests), we've invested in Computational Causal Inference software that can analyze massive datasets with a variety of causal effects in a performant, general, and robust way.
This talk will focus on how our software systems improve research agility and enable causal inference to be easily integrated into large engineering systems. We'll talk through several ongoing lines of work to enable progressively more sophisticated causal inference models on our highly scaled platform, and what we’ve learned in the process.
Tuesday, February 2, 2021: Interview with James Robins (Harvard)
[Video]
Tuesday, January 26, 2021: Stephen Bates (UC Berkeley)
"Causal Inference in Genetic Trio studies"
Discussant: Qingyuan Zhao (University of Cambridge)
Abstract: This work introduces a randomization test using high-dimensional genotypes to identify causal relationships between regions of the genome and outcomes (e.g., presence or absence of asthma). The proposed method is immune to the confounding factors typically encountered in genetic association studies because inference relies only on the randomness in the process of inheritance, a source of plausibly independent variation. As a randomization test, the proposed method can leverage black-box machine learning algorithms as part of the test statistic, which we show leads to improved power.
[Video] [Slides] [Discussant slides] [Paper]
Tuesday, January 19, 2021: Mark van der Laan (UC Berkeley)
"Higher order Targeted Maximum Likelihood Estimation"
Discussant: Alex Luedtke (University of Washington)
Abstract: Asymptotic linearity and efficiency of targeted maximum likelihood estimators (TMLE) of target features of the data distribution relies on a a second order remainder being asymptotically negligible. However, in finite samples, the second order remainder can dominate the sampling distribution so that inference based on asymptotic normality would be anti-conservative. We propose a new higher order (say k-th order) TMLE, generalizing the regular (first order) TMLE. We prove that it satisfies an exact linear expansion, in terms of efficient influence functions of sequentially defined higher order fluctuations of the target parameter, with a remainder that is a k+1-th order remainder. As a consequence, this k-th order TMLE allows statistical inference only relying on the k+1-th order remainder being negligible. We present the theoretical result as well as simulations for the second order TMLE for nonparametric estimation of the ATE, and of the integrated squared density.
[Video] [Slides] [Discussant slides]
Tuesday, January 12, 2021: Susan Athey (Stanford GSB)
"Synthetic Difference in Differences"
Abstract: We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. Relative to these methods, we find, both theoretically and empirically, that the proposed ``synthetic difference in differences'' estimator has desirable robustness properties, and that it performs well in settings where the conventional estimators are commonly used in practice. We study the asymptotic behavior of the estimator when the systematic part of the outcome model includes latent unit factors interacted with latent time factors, and we present conditions for consistency and asymptotic normality. Joint with Dmitry Arkhangelsky, David A. Hirshberg, Guido Imbens, Stefan Wager.
[Video] [Slides] [Paper]