The Reproducibility Crisis in ML‑based Science
July 28, 2022
10AM–4:30 PM ET
Online
The use of machine learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there's a reproducibility crisis brewing. Indeed, we found 20 reviews across 17 scientific fields that find errors in a total of 329 papers that use ML-based science.
Hosted by the Center for Statistics and Machine Learning at Princeton University, our online workshop aimed to highlight the scale and scope of the crisis, identify root causes of the observed reproducibility failures, and make progress towards solutions.
We have made the workshop materials public: the talks and slides below, and the annotated reading list.
Talks and slides
Background on the workshop and the crisis
Arvind Narayanan, Princeton University (7 minutes)
Leakage and the reproducibility crisis in ML-based science
Sayash Kapoor, Princeton University (7 minutes)
Overly optimistic prediction results on imbalanced data
Gilles Vandewiele, Ghent University (20 minutes)
Is the ML reproducibility crisis a natural consequence?
Michael Roberts, University of Cambridge (20 minutes)
Panel 1: Diagnose
Moderator: Priyanka Nanayakkara
Panelists:
Gilles Vandewiele, Michael Roberts, Odd Erik Gundersen
How to avoid machine learning pitfalls: a guide for academic researchers
Michael Lones, Heriot-Watt University (20 minutes)
Consequences of reproducibility issues in ML research and practice
Inioluwa Deborah Raji, University of California Berkeley (20 minutes)
When (and why) we shouldn't expect reproducibility in ML-based science
Momin M. Malik, Mayo Clinic (20 minutes)
The replication crisis in social science: does science self-correct?
Marta Serra-Garcia, University of California San Diego (20 minutes)
Panel 2: Fix
Moderator: Sayash Kapoor
Panelists: Michael Lones, Inioluwa Deborah Raji, Momin M. Malik, Marta Serra-Garcia
Integrating explanation and prediction in ML-based science
Jake Hofman, Microsoft Research (20 minutes)
The worst of both worlds: a comparative analysis of errors in learning from data in psychology and machine learning
Jessica Hullman, Northwestern University (20 minutes)
What is your estimand? Implications for prediction and machine learning
Brandon Stewart, Princeton University (20 minutes)
Panel 3: Future paths
Moderator: Arvind Narayanan
Speakers: Jake Hofman, Jessica Hullman, Brandon Stewart
Reading list and interactive session
In addition to the public session on July 28th, we also prepared additional content for participants who are interested in going deeper into reproducibility:
Annotated reading list: We prepared a reading list with relevant research on reproducibility from the last few years. The majority of these papers were presented by speakers at the workshop. The list is meant to be an accompanying resource for participants who want to go deeper into reproducibility.
Tutorial and interactive session on July 29th, 3-4:30 PM ET: In a recent preprint, we (Kapoor and Narayanan) introduced model info sheets for improving reproducibility by detecting and preventing leakage. In our testing so far, users have been able to detect leakage in models they previously built by filling out model info sheets.
On the day after the workshop (July 29th, 3-4:30 PM ET), we gave a brief tutorial on how model info sheets can help you prevent leakage in your own research, and then hosted an interactive session.
Organizers
Sayash Kapoor | Ph.D. candidate, Princeton University
Priyanka Nanayakkara | Ph.D. candidate, Northwestern University
Kenny Peng | Incoming Ph.D. student, Cornell University
Hien Pham | Undergraduate student, Princeton University
Arvind Narayanan | Professor of Computer Science, Princeton University
Questions? Contact sayashk@princeton.edu