2023: Spring Semester
Speaker: Sylvain Chassang (Princeton University)
Date: Friday, January 27, 2023
Time: 12:10 PM to 1:30 PM EST
Location: IAB LRR 707
Zoom Passcode: pmc
Title: A Theory of Experimenters: Robustness, Randomization, and Balance
Abstract: This paper studies the problem of experiment design by an ambiguity-averse decisionmaker who trades off subjective expected performance against robust performance guarantees. This framework accounts for real-world experimenters’ preference for randomization. It also clarifies the circumstances in which randomization is optimal: when the available sample size is large and robustness is an important concern. We apply our model to shed light on the practice of rerandomization, used to improve balance across treatment and control groups. We show that rerandomization creates a tradeoff between subjective performance and robust performance guarantees. However, robust performance guarantees diminish very slowly with the number of rerandomizations. This suggests that moderate levels of rerandomization usefully expand the set of acceptable compromises between subjective performance and robustness. Targeting a fixed quantile of balance is safer than targeting an absolute balance objective.
Speaker: Naoki Egami (Columbia University)
Date: Friday, February 24, 2023
Time: 12:10 PM to 1:30 PM EST
Location: IAB LRR 707
Zoom Passcode: pmc
Title: Empirical Strategies Toward External Validity
Abstract: How can researchers generalize findings from specific studies to broader populations, contexts, and settings? This question of external validity has a long history in the social sciences, going back to at least the 1960s. It has recently become even more essential, given that researchers have conducted a large number of internally valid studies (e.g., randomized controlled trials and quasi-experimental observational studies) over the last two decades, and huge opportunities and challenges of accumulating causal knowledge have become evident. In this talk, I will discuss a unified pipeline for external validity, consisting of a framework, study design, and data analysis. First, I will introduce a framework of external validity (Egami and Hartman, 2022; APSR) that synthesizes diverse external validity concerns. Then, I will discuss how to design studies for external validity (Egami and Lee, 2023+). In particular, I examine a question of the site/case selection, e.g., where should we run experiments, and which cases should we examine? I will propose a simple algorithm to optimally select study sites such that researchers can credibly estimate generalizable causal effects. This new approach, which I call the synthetic purposive sampling, combines ideas from the synthetic control method in the causal inference literature and purposive sampling in the research design literature. It offers statistical justification for purposive sampling and clarifies optimal ways to include diverse and heterogeneous cases for better external validity. Finally, if time permits, I will also discuss how to assess the robustness of causal findings to external validity bias (Devaux and Egami, 2022). This measure of external robustness is particularly useful when researchers analyze randomized experiments that were designed without explicit external validity consideration.
Speaker: John Huber (Columbia University)
Date: Friday, March 24, 2023
Time: 2:10 PM to 3:30 PM EST
Location: IAB LRR 707
Zoom Passcode: pmc
Title: Out of the nightlights shadow: Estimating spatial economic activity in dollars (with Laura Mayoral, Institute for Economic Analysis and Barcelona School of Economics)
Abstract: Since spatially disaggregated measures of economic well-being are scarce in the developing world, recent research has increasingly turned to satellite measures of nightlights as proxies for economic development. But nightlights suffer from two important limitations: (a) they contain substantial non-classical measurement error, and (b) they lack a substantively interpretable metric. This paper presents an approach to overcoming these limitations. We describe a mathematical framework for combining individual-level surveys of asset ownership with aggregated data on consumption per capita to produce a new measure of spatially-disaggregated consumption per capita, denominated in 2011 PPP dollars. To implement this framework, we use geocoded surveys to construct a training variable for over 34,000 locations in Africa. This variable is used to train a random forest model that includes a wide range of predictors. The prediction exercise allows us to estimate consumption and poverty in all 10x10km cells in sub-Saharan Africa over time. Prediction accuracy is very high, and external data confirm the validity of the estimates. We then use the new data to illustrate why analyses using nightlights can be expected to lead to erroneous conclusions due to non-classical measurement error. As is well known, non-classical error can bias results in either direction (attenuation and amplification bias), and can do so when a variable measured with such error is employed as either a dependent or an independent variable. Using the data developed here, we illustrate both types of bias by revisiting two important studies of institutions and economic development.
Speaker: Brandon Stewart (Princeton University)
Date: Friday, April 7, 2022
Time: 12:10 PM to 1:30 PM
Location: IAB Harriman 1219 (Different from the usual location)
Zoom Passcode: pmc
Title: Strengthening Propaganda and the Limits of Media Commercialization in China: Evidence from Millions of Newspaper Articles
Abstract: A defining feature of the information environment in contemporary China is scripted government propaganda---the government directing newspapers to use specific language when reporting on particular events. Yet due to the mix of syndication and scripting, it is difficult to tell if any given article is explicitly government-directed news. Using a newly-collected database of six million newspaper articles from major domestic newspapers in China and linking them to leaked propaganda directives, we identify scripted propaganda coordinated by China's Central Publicity Department from 2012-2021 by examining patterns of text re-use across papers published on the same day. We demonstrate that over the past 10 years, scripting in official party newspapers shows increasing constraint and more focus on explicitly ideological content. While media commercialization has long been touted as a mechanism for government oversight in China, our results indicate that commercial papers do not compensate for changes in official newspapers and follow official scripting at similar rates on topics about domestic politics.
2022-2023: Fall Semester
Speaker: Eunji Kim (Columbia University)
Date: Friday, October 14, 2022
Time: 12:10 PM to 1:30 PM EST
Location: IAB 707
Zoom Passcode: pmc
Title: Revisiting the Fox News Effect
Abstract: Fox News has a near-ubiquitous presence in the contemporary discussion about American politics. Yet too often, the scholarly focus on estimating Fox News effects on a host of outcomes has come at the expense of a lack of attention to the key construct: exposure to Fox News. A survey of relevant literature gives us an inconsistent picture of the extent to which Americans are exposed to Fox News, with the realities of a multi-platform, high-choice media environment further complicating the task of obtaining an overall snapshot of American media consumption at any given time. We ambitiously assemble all available data on Fox News consumption (i.e., Nielsen TV ratings, YouTube, Twitter, Facebook, web traffic, and self-reports) for a one-month period. We find that across all measures, American exposure to Fox News is relatively limited—except for self-reported consumption in surveys. These inflated self-reports, when aggregated and disseminated, create a false impression that a substantial portion of Americans is consuming Fox News. Our survey experiment shows that those misperceptions are mostly resistant to corrections and can even exacerbate partisan stereotypes and polarization.
Book Symposium featuring "Time Counts: Quantitative Analysis for Historical Social Science" by Gregory Wawro (Columbia University) and Ira I. Katznelson (Columbia University)
Guest Speakers: Daniel Carpenter (Harvard University), James Mahoney (Northwestern University), and Rocío Titiunik (Princeton University)
Date: Friday, November 11, 2022
Time: 11:00 AM to 1:00 PM EST
Location: IAB 707
Zoom Passcode: pmc
Speaker: Yamil Velez (Columbia University)
Date: Friday, November 18, 2022
Time: 12:10 PM to 1:30 PM
Location: IAB 707
Zoom Passcode: pmc
Title: Confronting Core Attitudes: A Critical Test of Motivated Reasoning (with Patrick Liu, Columbia University)
Abstract: Whether individuals update their beliefs and attitudes in the direction of evidence or grow more confident in their convictions when confronted with counter-attitudinal information is a long-standing debate in the political psychology literature. Though recent studies have shown that instances of attitude polarization and belief backfire are rarely observed in settings involving hot-button issues or viral misinformation, we know surprisingly little about how individuals respond to information targeting their core beliefs and attitudes. To address this gap, we develop a tailored experimental design that measures participants' strongly-held issue positions and randomly assigns them to different mixtures of personalized pro-attitudinal and counter-attitudinal information using the large language model GPT-3. We fail to recover evidence consistent with motivated reasoning across two studies, despite creating ideal conditions for detecting attitude polarization. We conclude by discussing the implications for the study of political cognition and the measurement of attitudes.
2021-2022: Spring Semester
Speaker: John Marshall (Columbia University)
Date: February 25th, 2022
Time: 12:10 - 1:30 pm (Eastern time)
Zoom Link: https://columbiauniversity.zoom.us/j/97873183872?pwd=MmR4eHUvbGcvRG9XZ24yOGlHa3pUdz09
Zoom passcode: pmc
Title: Can close election regression discontinuity designs identify effects of winning politician characteristics?
Abstract: Politician characteristic regression discontinuity (PCRD) designs leveraging close elections are widely used to isolate effects of an elected politician characteristic on downstream outcomes. Unlike standard regression discontinuity designs, treatment is defined by a predetermined characteristic that could affect a politician’s victory margin. I prove that, by further conditioning politicians that won close elections, PCRD estimators identify the effect of the specific characteristic of interest and all compensating differentials—candidate-level characteristics that ensure election winners remain in close races despite being advantaged/disadvantaged by the characteristic of interest. Avoiding this asymptotic bias generally requires assuming either that the specified characteristic does not affect candidate vote shares or that no compensating differential affects the outcome. Since theories of voting behavior suggest that neither strong assumption usually holds, I further explain the implications for interpreting continuity and consider whether and how covariate adjustment, bounding, and recharacterizing treatment can mitigate the post-treatment bias afflicting PCRD designs.
Speaker: Elizabeth Tipton (Northwestern University)
Date: February 4th, 2022
Time: 12:10 - 1:30 pm (Eastern time)
Zoom Link: https://columbiauniversity.zoom.us/j/97873183872?pwd=MmR4eHUvbGcvRG9XZ24yOGlHa3pUdz09
Zoom passcode: pmc
Title: Designing RCTs for evidence-based decision-making
Abstract: In laboratory and other highly controlled settings, randomized trials allow for the testing of scientific theories by providing unbiased estimates of average causal effects. Over the past several decades, however, randomized trials have increasingly been used for questions of policy and practice. These questions are not simply inferential questions, however, but are inherently questions of prediction. That is, we conduct the trial not only to infer if the intervention ‘works’ in the study, but to provide evidence for its efficacy for future units. In this talk, I set up the goal of an RCT within a prediction framework. Doing so allows for connections between questions of generalization, treatment effect heterogeneity, and model selection. I then focus in on questions of treatment effect heterogeneity and how to best design randomized trials to estimate these relationships. In particular, I show that by methods for optimal designs found in response surface models can be useful, too, in designing sampling plans that result in increased power and precision for these moderator effects. I situate this in an example based on an evaluation of a school-based reading program.
Speaker: Dan Hopkins (University of Pennsylvania)
Date: April 22nd, 2022
Time: 12:10 - 1:30 pm (Eastern time)
Zoom Link: https://columbiauniversity.zoom.us/j/97873183872?pwd=MmR4eHUvbGcvRG9XZ24yOGlHa3pUdz09
Zoom passcode: pmc
2021-2022: Fall Semester
Speaker: David Blei (Columbia; Statistics and Computer Science)
Date: November 19th, 2021
Time: 12:10 - 1:30 pm (Eastern time)
Location: 707 International Affairs Building (the Lindsay Rogers Room)
Zoom passcode: pmc
Title: The Blessings of Multiple Causes
Abstract: Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods require that we observe all confounders, variables that affect both the causal variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. In this talk, I will describe the deconfounder, a way to do causal inference with alternative assumptions than the classical methods require.
How does the deconfounder work? While traditional causal methods measure the effect of a single cause on an outcome, many modern scientific studies involve multiple causes, different variables whose effects are simultaneously of interest. The deconfounder uses the correlation among multiple causes as evidence for unmeasured confounders, combining unsupervised machine learning and predictive model checking to perform causal inference.
In this talk, I will describe the deconfounder methodology and discuss the theoretical requirements for the deconfounder to provide unbiased causal estimates. I will touch on some of the academic debates surrounding the deconfounder, and demonstrate the deconfounder on real-world data and simulation studies.
This is joint work with Yixin Wang.
Paper Link: https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1686987
Speaker: Kara Rudolph (Columbia; Mailman School of Public Health)
Date: November 5th, 2021
Time: 12:10 - 1:30 pm (Eastern time)
Zoom passcode: pmc
Title: Towards understanding one-size-does-not-fit-all nuances
Abstract: Interventions can have harmful effects among subgroups they intend to help. Or, even if the total effect of an intervention on a particular outcome is beneficial, there could be a harmful indirect effect – the effect of the intervention on an outcome through mediators. In some cases, the implications of a likely harmful indirect effect may outweigh an anticipated beneficial total effect, and would motivate further discussion of whether to treat identified individuals. We build on the mediation and optimal treatment rule literatures to propose a method of identifying a subgroup for which the treatment effect through the mediator is expected to be harmful and quantify the expected interventional indirect effect for this subgroup. We apply the proposed approach to identify a subgroup of boys in the Moving to Opportunity housing voucher experiment who would be predicted to experience harmful interventional indirect effects, though their predicted interventional total effects are beneficial.
Paper Link: https://arxiv.org/abs/2101.08590
Speaker: Naoki Egami (Columbia; Political Science)
Date: October 15th, 2021
Time: 12:00 - 1:30 pm (Eastern time)
Title: Identification and Estimation of Causal Peer Effects Using Double Negative Controls for Unmeasured Network Confounding
Abstract: Scientists have been interested in estimating causal peer effects to understand how people’s behaviors are affected by their network peers. However, it is well known that identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and contextual confounding. The second issue is network dependence of observations, which one must take into account for valid statistical inference. Negative control variables, also known as placebo variables, have been widely used in observational studies including peer effect analysis over networks, although they have been used primarily for bias detection. In this article, we establish a formal framework which leverages a pair of negative control outcome and exposure variables (double negative controls) to nonparametrically identify causal peer effects in the presence of unmeasured network confounding. We then propose a generalized method of moments estimator for causal peer effects, and establish its consistency and asymptotic normality under an assumption about ψ-network dependence. Finally, we provide a network heteroskedasticity and autocorrelation consistent variance estimator. Our methods are illustrated with an application to peer effects in education.
Paper Link: https://naokiegami.com/paper/dnc_peer.pdf
Speaker: Nikhar Gaikwad (Columbia; Political Science)
Date: September 17th, 2021
Time: 12:00 - 1:30 pm (Eastern time)
Title: How International Migration Shapes Political Economy Preferences: Evidence from a Field Experiment
Abstract: How does cross-border labor migration shape migrants’ economic prospects and political attitudes? Scholars have long debated how mobility induces political change, yet evaluating the effect of migration on attitudes and behaviors is challenging because individuals who select to migrate differ from those who do not. Partnering with local governmental and non-governmental organizations in Mizoram, India, we conducted a randomized controlled trial connecting individuals from marginalized communities seeking overseas employment with well-paying jobs in the Persian Gulf region’s hospitality sector. We tracked subjects’ economic trajectories and political attitudes at three stages: at baseline, after a training program but prior to migration, and two years after migration. Subjects who received the opportunity to move abroad markedly improved their economic positions, with mean wages more than double those of the control group. They shifted household economic plans, delaying marriage and childbearing decisions. Material changes accompanied significant transformations in policy preferences, with treatment subjects becoming far less supportive of state-led taxation and redistribution than those in the control. Our results illustrate how both the prospect of upward mobility associated with emigration and realized economic gains from labor migration itself alter individuals’ political preferences.
Speaker: Jennifer Hill (NYU)
Date: April 9th, 2021
Time: 2:00 - 3:15pm (Eastern time)
Title: BART + Stan + Causal Inference: Creating more flexible causal inference models
Abstract: There has been increasing interest in the past decade in use of machine learning tools in causal inference to help reduce reliance on parametric assumptions and allow for more accurate estimation of heterogeneous effects. This talk reviews the work in this area that capitalizes on Bayesian Additive Regression Trees, an algorithm that embeds a tree-based machine learning technique within a Bayesian framework to allow for flexible estimation and valid assessments of uncertainty. It will briefly review extensions of the original work to address common issues in causal inference: lack of common support, violations of the ignorability assumption, and generalizability of results to broader populations. It will then introduce a new package that combines the flexibility of BART with the power of Stan to fit models that incorporate parametric extensions of the BART model to accommodate multilevel data structures in a principled way.
Speaker: Betsy Sinclair (Washington University in St Louis)
Date: March 26th, 2021
Time: 12:00 - 1:15pm (Eastern time)
Title: "Legislative Communication and Power: Measuring Leadership from Social Communication Data"
Abstract: Who leads and who follows in Congress? By leveraging congressional Twitter accounts, this paper develops a new understanding of congressional leadership organization via innovative natural language processing methods. Formal theoretic work on congressional leadership suggests two hypotheses: first, official party leaders should be more likely to initiate discussion on topics where the party has little need for policy direction. Topics that are in need of policy direction have outsized effects as a result of failing or succeeding at coordinating around a policy stance. Second, as barriers to coordination on policy stances of party rank-and-file members increases, leadership’s ability to persuade rank-and-file members to adopt leadership’s messaging strategy should increase. Specifically, we exploit the network structure of retweets to derive measures of leadership centrality within each party. We then employ Joint Sentiment Topic modeling to quantify the discussion space for legislators on Twitter. We find partial support for the first hypothesis: for issues not in need of direction, leaders do not generally initiate the discussion, although they do so more often than rank-and-file members. Moreover, increases in leaders’ propensity to discuss a sentiment-topic result in meaningful increases in rank-and-file members’ propensities to discuss those same sentiment-topics. In contrast to the literature, however, we find that rank-and-file members exert this same type of influence over their leaders, and moreover that rank-and-file influence is larger in magnitude than that of party leadership. We also find strong correlative evidence for the second hypothesis: as the barriers to coordination in policy stances within a party increases, party leaders hold more central – and arguably more powerful – roles within their party.
Speaker: Walter Mebane (University of Michigan)
Date: February 19th, 2021
Time: 2:00-3:15pm (Eastern time)
Title: Party Words: Partisan Associations From Word Embeddings of Twitter Users' Bios
Abstract: To measure the partisanship of Twitter users, we use word and document embeddings to generate presidential campaign partisan associations from the descriptions (bios) of Twitter users who reported a personal experience with the 2016 U.S. general election process. Partisan associations are cosine similarities between description vectors and partisan subspaces defined using keywords that refer to the presidential campaign, candidates and parties. Innovations include: an application-specific loss function to select doc2vec hyperparameters; a sampling approach to increase reliability; accounting for hostile keyword uses. Activities such as retweets, favorites, hashtags, following and description changes help validate the associations. Associations' relationships to following members of Congress show they reflect sentiments and engagements similar to what scaling methods capture, except they are defined for more users. The associations are contemporaneously portable to Reddit and port to 2018 survey respondents' Twitter bios. Associations relate plausibly to survey questions about party identification and liberal-conservative ideology.
Speaker: Tamar Mitts (Columbia University)
Date: January 29th, 2021
Time: 2:00 - 3:15pm (Eastern time)
Title: Banned: How De-platforming Extremists Mobilizes Hate in the Dark Corners of the Internet
Abstract: In recent years, the world has seen a rapid increase in the use of social media platforms by violent extremist groups. Militants espousing radical ideologies have been using online platforms to communicate, disseminate propaganda, and in some cases, plan violent acts. In response, social media companies have suspended accounts and taken down content containing violent propaganda and hate speech. While these efforts have reduced the availability of such content online, little is known about what happens to suspended individuals after being banned from these platforms. Drawing on unique data that includes information on individuals who have accounts both on Twitter (a mainstream platform) and Gab (a fringe platform favored by white supremacy extremists), I show that Twitter suspensions increase engagement with hate speech on Gab.
Speaker: Joshua Kalla (Yale University)
Date: December 11th, 2020
Time: 2:00 - 3:15pm Eastern time
Title: Policy Voting in Elections: Field Experimental Evidence from 12.8 Million Voters in 82 Congressional Districts
Speaker: Betsy Ogburn (Johns Hopkins University)
Date: November 13th, 2020
Time: 10:00 - 11:15am Eastern time
Title: Social Network Dependence and Unmeasured Confounding (Link)
Abstract: In joint work with Youjin Lee, we showed that social network dependence can result in spurious associations, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc. Social network dependence in both the exposure and outcome of interest can result in association and effect estimates that are concentrated away from the truth, even in the absence of confounding and even under the null of no association. In the latter part of the talk I will discuss how the phenomenon of spurious associations due to dependence is related to unmeasured confounding by network structure, akin to confounding by population structure in GWAS studies, and how this relationship sheds light on methods to control for both spurious associations and unmeasured confounding.
Speaker: Stefan Wager (Stanford GSB)
Date: October 16th, 2020
Time: 2:00 - 3:30pm Eastern time
Title: Noise-Induced Randomization in Regression Discontinuity Designs (Link)
Abstract: Regression discontinuity designs are used to estimate causal effects in settings where treatment is determined by whether an observed running variable crosses a pre-specified threshold. While the resulting sampling design is sometimes described as akin to a locally randomized experiment in a neighborhood of the threshold, standard formal analyses do not make reference to probabilistic treatment assignment and instead identify treatment effects via continuity arguments. Here we propose a new approach to identification, estimation, and inference in regression discontinuity designs that exploits measurement error in the running variable. Under an assumption that the measurement error is exogenous, we show how to consistently estimate causal effects using a class of linear estimators that weight treated and control units so as to balance a latent variable of which the running variable is a noisy measure. We find this approach to facilitate identification of both familiar estimands from the literature, as well as policy-relevant estimands that correspond to the effects of realistic changes to the existing treatment assignment rule. We demonstrate the method with a study of retention of HIV patients and evaluate its performance using simulated data and a regression discontinuity design artificially constructed from test scores in early childhood.
Speaker: Macartan Humphreys (Columbia University and WZB Berlin)
Date: September 25th, 2020
Time: 10:00 - 11:30am Eastern time
Title: Causal Models: Guide to CausalQueries
Abstract: Introducing a package to build, update, and query Bayesian causal models on binary nodes. The approach used in CausalQueries is a generalization of the biqq models described in “Mixing Methods: A Bayesian Approach” (Humphreys and Jacobs 2015). This guide is supplementary material for our book-in-progress "Integrated Inferences.”