Research

Work In Progress

Mass Reproducibility and Replicability: A New Hope

(as third author with Abel Brodeur and Derek Mikola)

This study pushes our understanding of research reliability by reproducing and replicating claims from 110 papers in leading economic and political science journals. The analysis involves computational reproducibility checks and robustness assessments. It reveals several patterns. First, we uncover a high rate of fully computationally reproducible results (over 85%). Second, excluding minor issues like missing packages or broken pathways, we uncover coding errors for about 25% of studies, with some studies containing multiple errors. Third, we test the robustness of the results to 5,511 re-analyses. We find a robustness reproducibility of about 70%. Robustness reproducibility rates are relatively higher for re-analyses that introduce new data and lower for re-analyses that change the sample or the definition of the dependent variable. Fourth, 52% of re-analysis effect size estimates are smaller than the original published estimates and the average statistical significance of a re-analysis is 77% of the original. Lastly, we rely on six teams of researchers working independently to answer eight additional research questions on the determinants of robustness reproducibility. Most teams find a negative relationship between replicators' experience and reproducibility, while finding no relationship between reproducibility and the provision of intermediate or even raw data combined with the necessary cleaning codes.

We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments

(with Abel Brodeur and Anthony Heyes)

Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies’ conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

Statistical Significance and Science Mobilization: Evidence from 10,404 Hypotheses in Leading Health Journals

(with Abel Brodeur, Anthony Heyes, and Taylor Wright)

(Draft not yet available - but see it presented in the video)

Stay Frosty: Climate Change and Gun Violence in North America

(with Taylor Wright)

Hotter temperatures due to climate change are expected to increase interpersonal violence, alongside many other social and economic costs. In this paper, we find that the largest effects of temperature on gun violence in Chicago are found not through hot days getting hotter but through cold days growing warmer. We find that gun violence increases by 1% for every 1C warmer daily temperature. Notably, we use data both from automated gun violence reporting and traditional police reports to find similar results, suggesting that it is the incidence of crime (rather than the previously composite crime incidence and probability it is reported) that is affected by temperature. We find no effects of previous-day temperatures, discouraging a hypothesis of pre-meditation and instead suggesting milder-temperature opportunity as the potential mechanism. We also find evidence that the COVID-19 pandemic increased the sensitivity of gun violence to temperature. Taken together, our results suggest that research focused on hotter days due to climate change likely underestimate the future potential increase in gun violence.

Publications

Do Preregistration and Preanalysis Plans Reduce p-Hacking and Publication Bias? Evidence from 15,992 Test Statistics and Suggestions for Improvement

Journal of Political Economy: Microeconomics, August 2024

(with Abel Brodeur, Jonathan Hartley, and Anthony Heyes)

Preregistration is regarded as an important contributor to research credibility. We investigate this by analyzing the pattern of test statistics from the universe of randomized controlled trial studies published in 15 leading economics journals. We draw two conclusions: (a) Preregistration frequently does not involve a preanalysis plan (PAP), or sufficient detail to constrain meaningfully the actions and decisions of researchers after data are collected. Consistent with this, we find no evidence that preregistration in itself reduces p-hacking and publication bias. (b) When preregistration is accompanied by a PAP we find evidence consistent with both reduced p -hacking and reduced publication bias.

Increasing student access through aid: Differences in difference-in-differences estimates

Economics Letters, May 2024

Causal identification of a student aid program’s impact can be difficult as the best control group is often a small number of out-of-province students who likely differ from locals in unobservable ways. This paper evaluates the impacts of the 30% Off Ontario Tuition Grant using administrative data from the Ontario–Quebec border, where a large number of local students are subject to a different province’s unchanged aid program. The Grant improved access to education; cohorts enrolled after the Grant was announced come from poorer areas, but also achieved lower graduation rates than comparable local yet out-of-province students. I present estimates using three different control groups: a local-student comparison offers the largest results, with more traditional comparisons finding similar but smaller effects.

P-Hacking, Data Type and Data-Sharing Policy

The Economic Journal, April 2024

(with Abel Brodeur and Carina Neisser)

This paper examines the relationship between p-hacking, publication bias and data-sharing policies. We collect 38,876 test statistics from 1,106 articles published in leading economic journals between 2002–20. We find that, while data-sharing policies increase the provision of data, they do not decrease the extent of p-hacking and publication bias. Similarly, articles that use hard-to-access administrative data or third-party surveys, as compared to those that use easier-to-access (e.g., author-collected) data, are not different in their p-hacking and publication extent. Voluntary provision of data by authors on their home pages offers no evidence of reduced p-hacking.

Clean Air and Cognitive Productivity: Effect and Adaptation

Journal of the Association of Environmental and Resource Economists, September 2023

(with Anthony Heyes and Nicholas Rivers)

We observe 1.8 million university course grades for 88,959 adults who learn and complete examinations in a much less polluted environment than previously studied. We use a within-student identification strategy and find robust evidence of a negative and causal effect of exam-day outdoor air pollution on course performance. The effect of pollution persists beyond the same-day effect. Female students are more sensitive than males, and effects are greatest when students are engaged in unfamiliar tasks. We explore two margins of adaptation, one infrastructural, one behavioral. Working in a new building, and particularly if it is high quality (LEED Gold), provides significant mitigation. Relocating to a floor above ground level also offers partial protection.

Pollution Pictures: Psychological Exposure to Pollution Impacts Worker Productivity in a Large-scale Field Experiment

Journal of Environmental Economics and Management, July 2022

(with Anthony Heyes)

While contemporaneous exposure to polluted air has been shown to reduce labor supply and worker productivity, little is known about the underlying mechanisms. We present first causal evidence that psychological exposure to pollution – the “thought of pollution” – can influence employment performance. Over 2000 recruits on a leading micro-task platform are exposed to otherwise identical images of polluted (treated) or unpolluted (control) scenes. Randomization across the geographically-dispersed workforce ensures that treatment is orthogonal to physical pollution exposure. Treated workers are less likely to accept a subsequent offer of work (labor supply) despite being offered a piece-rate much higher than is typical for the setting. Conditional on accepting the offer, treated workers complete between 5.1% to 10.1% less work (labor productivity) depending on the nature of their assigned task. We find no effect on work quality. Suggestive evidence points to the role of induced negative sentiment. Decrements to productivity through psychological mechanisms are plausibly additional to any from physical exposure to polluted air.

On the Effects of COVID-19 Safer-at-Home Policies on Social Distancing, Car Crashes and Pollution

Journal of Environmental Economics and Management, March 2021

(with Abel Brodeur and Taylor Wright)

This paper investigates the impacts of COVID-19 safer-at-home polices on collisions and pollution. We find that statewide safer-at-home policies lead to a 20% reduction in vehicular collisions and that the effect is entirely driven by less severe collisions. For pollution, we find particulate matter concentration levels approximately 1.5 μg/m3 lower during the period of a safer-at-home order, representing a 25% reduction. We document a similar reduction in air pollution following the implementation of similar policies in Europe. We calculate that as of the end of June 2020, the benefits from avoided car collisions in the U.S. were approximately $16 billion while the benefits from reduced air pollution could be as high as $13 billion.

Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics

American Economic Review, November 2020

(with Abel Brodeur and Anthony Heyes)

The credibility revolution in economics has promoted causal identification using randomized control trials (RCT), difference-in-differences (DID), instrumental variables (IV) and regression discontinuity design (RDD). Applying multiple approaches to over 21,000 hypothesis tests published in 25 leading economics journals, we find that the extent of p-hacking and publication bias varies greatly by method. IV (and to a lesser extent DID) are particularly problematic. We find no evidence that (i) papers published in the Top 5 journals are different to others; (ii) the journal "revise and resubmit" process mitigates the problem; (iii) things are improving through time.

Brain Freeze: Outdoor Cold and Indoor Cognitive Productivity

Journal of Environmental Economics and Management, May 2020

(with Anthony Heyes)

We present first evidence that outdoor cold temperatures negatively impact indoor cognitive performance. We use a within-subject design and a large-scale dataset of adults in an incentivized setting. The performance decrement is large despite the subjects working in a fully climate-controlled environment. Using secondary data, we find evidence of partial adaptation at the organizational, individual and biological levels. The results are interpreted in the context of climate models that observe and predict an increase in the frequency of very cold days in some locations (e.g. Chicago) and a decrease in others (e.g. Beijing).

A Proposed Specification Check for p-Hacking

AEA Papers & Proceedings, May 2020

(with Abel Brodeur and Anthony Heyes)

We propose a specification check for p-hacking. More specifically, we advocate the reporting of t-curves and mu-curves—the t-statistics and estimated effect sizes derived from regressions using every possible combination of control variables from the researcher's set—and introduce a standardized and accessible implementation. Our specification check allows researchers, referees, and editors to visually inspect variation in effect sizes, significativity, and sensitivity to the inclusion of control variables. We provide a Stata command that implements the specification check. Given the growing interest in estimating causal effects, the potential applicability of this specification check to empirical studies is large.