The Challenge Problems Paradigm in Empirical Machine Learning and Beyond

Stanford University, Autumn 2023
Instructors: David Donoho & X.Y. Han
IT Maven: Andrew Donoho

Mondays & Wednesdays, 10:30AM - 11:50AM
Wallenberg Hall (Building 160), Room 326

Course Description

In many fields of science and technology, empirical research has been making rapid progress by implicitly following a little-studied research paradigm with several distinctive features: a shared public database, a common task, (for example, prediction of class labels or a response variable from given input features), an objective scoring rule that quantifies performance on that task, a leaderboard that tracks performance of submissions, and a set of enrolled competitors who each try to improve the current best-known performance on that task. In the context of Empirical Machine Learning, this, is explicitly the famous “Kaggle” model; however, Kaggle didn’t originate this approach, and many research disciplines follow the same ingredients, in many cases implicitly or tacitly. As we know, the CPP anchored recent claims of progress in image understanding and in natural language processing. In this course we will review the many instances and variations on the CPP that exist in modern research, including not only in the standard areas of empirical machine learning (computer vision and natural language understanding) but also in academic empirical finance and computational hard sciences. We will discuss evidence that the CPP itself is a kind of secret sauce, rather than the specific technologies that are spotlighted because of CPP. We will discuss software platforms implementing CPP, including Kaggle, but also academic platforms like CodaLab, which is often used for challenge problems in natural language processing, and Nightingale Open Science, which is used for challenge problems involving potentially protected health information.

We will discuss the history of CPP; from the prehistory of failed research paradigms in natural language processing, to its consistent success across decades of different research contexts when fields like natural language, biometrics, and protein folding started using variants of this paradigm in 1985-1995, to similar frameworks have arisen independently in various scientific literatures.

We will also discuss this paradigm critically, and identify a number of weaknesses it exhibits both as an epistemological tool and in the sociology of how it changes the fields where it is used. For example, it can propel research forward, by defining a clear path that an aspiring researcher can take to notch up a widely recognized contribution. This can attract massive investment of time and energy, as compared to other research paradigms based on theoretical analyses or private, unshared data. At the same time CPP can distort the questions asked and the perception of what is important, and it can skew incentives about what work is worth doing, towards the incremental and away from the fundamental. We will discuss interesting meta-research that shows how things change in the world because of the public nature of CPP. We will also learn about software systems that make it easy to develop your own instance of the common task framework in your field.