Reproducibility in ML Workshop, ICML'18
This workshop focuses on how to present papers from the coding perspective so that reproducibility and replication of results in the Machine Learning community becomes easier.
Papers from the Machine Learning community are supposed to be a valuable asset. They can help to inform and inspire future research. They can be a useful educational tool for students. They are the driving force of innovation and differentiation in the industry, so quick and accurate implementation is really critical. On the research side they can help us answer the most fundamental questions about our existence - what does it mean to learn and what does it mean to be human? Reproducibility, while not always possible in science (consider the study of a transient astrological phenomenon like a passing comet), is a powerful criteria for improving the quality of research. A result which is reproducible is more likely to be robust and meaningful and rules out many types of experimenter error (either fraud or accidental). There are many interesting open questions about how reproducibility issues intersect with the Machine Learning community:
- How can we tell if papers in the Machine Learning community are reproducible even in theory? If a paper is about recommending news sites before a particular election, and the results come from running the system online in production - it will be impossible to reproduce the published results because the state of the world is irreversibly changed from when the experiment was run.
- What does it mean for a paper to be reproducible in theory but not in practice? For example, if a paper requires tens of thousands of GPUs to reproduce or a large closed-off dataset, then it can only be reproduced in reality by a few large labs.
- For papers which are reproducible both in theory and in practice - how can we ensure that papers published in ICML would actually be able to replicate if such an experiment were attempted?
- What is the best way of publishing the code of the papers so that it is easy for engineers to implement it? Just publishing ipython notebooks it is not sufficient and often hard to make it work in different platforms
- A lot of people tend to understand an algorithm by looking at code and not by following equations. How can we come up with a framework of publishing that includes them. Is pseudocode the best we can do?
- While scientific papers often do an importance analysis of the features, ML papers rarely do proper attribution on the importance of algorithmic components and hyperparameters. What is the best way to “unit-test” an algorithm and do attribution of the results to certain components and hyperparameters
- What does it mean for a paper to have successful or unsuccessful replications?
- Of the papers with attempted replications completed, how many have been published?
- What can be done to ensure that as many papers which are reproducible in theory fall into the last category?
- On the reproducibility issue, what can the Machine Learning community learn from other fields?
- Part of ensuring reproducibility of state-of-the-art is ensuring fair comparisons, proper experimental procedures, and proper evaluation methods and metrics. To this end, what are the proper guidelines for such aspects of machine learning problems? How do they differ among subsets of machine learning?
Call for Papers
Our aim in the following workshop is to raise the profile of these questions in the community and to search for their answers. In doing so we aim for papers focusing on the following topics:
- Analysis of the current state of reproducibility in machine learning. Some examples of this include experimental-driven investigations as in [1,2,3]
- Investigations and proposals of proper experimental procedure and evaluation methodologies which ensure reproducible and fair comparisons in novel literature 
- Tools to help improve reproducibility
- Evidence-driven works investigating the importance of reproducibility in machine learning and science in general
- Connections between the reproducibility situation in Machine Learning and other fields
- Rigorous replications, both failed and successful, of influential papers in the Machine Learning literature.
- With the emergence of new fast prototyping systems such as TensorFlow, CNTK, PyTorch, MXNet, etc it is now much easier to present an implementation, but this is just the beginning. How can we build tools on top of them so that we can get an X-Ray of the algorithm that shows how the components work together.
We will accept both short paper (4 pages) and long paper (8 pages) submissions (not including references). Submissions should be in the NIPS 2018 format. A few papers may be selected as oral presentations, and the other accepted papers will be presented in a poster session. There will be no proceedings for this workshop, however, upon the author’s request, accepted contributions will be made available in the workshop website. Submission are single-blind, peer-reviewed on OpenReview (https://openreview.net/group?id=ICML.cc/2018/RML), and open to already published work.
Workshop Paper Submission Deadline:
June 5th June 10th (extended)
- Submit papers here: https://openreview.net/group?id=ICML.cc/2018/RML
Workshop Paper Decision: June 20th
Camera Ready Deadline: July 1st
Submission Instructions will be posted soon.
July 14th, Stockholm
8:30-9:00 Animashree Anandkumar & Opening Remarks
9:00-9:30 John Ioannidis
9:30-10:00 Alexandre Gramfort, Reproducible ML: Software challenges, anecdotes and some engineering solutions
10:00-10:30 Coffee break/posters
10:30-11:00 Percy Liang
11:00-11:30 Olivia Guest, Varieties of Reproducibility in Empirical and Computational Domains
11:30-12:00 Alistair Johnson
12:00-12:30 Nicolas Rougier
12:30 - 14:00 Lunch Break
14:00 - 14:40 MLTRAIN Tutorial (Nikolaos Vasiloglou)
14:40 - 15:10 Armand Joulin
15:10 - 15:20 Contributed Talk 1: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
15:20 - 15:30 Contributed Talk 2: Depth First Learning: Learning to Understand Machine Learning
15:30 - 16:00 Coffee break/posters
16:00 - 16:30 Steve Hsu, Machine Learning, Genomics, and Reproducibility
16:30 - 17:00 Aki Vehtari, Reproducibility and Stan
17:00 - 17:30 Joelle Pineau
17:30-18:30 Panel (Moderator: Joelle Pineau)
Panelists: Olivia Guest, Alistair Johnson, Steve Hsu, Animashree Anandkumar, Armand Joulin
Invited Speakers and Panelists
- John Ioannidis, Stanford Medical School
- Joelle Pineau, Facebook and McGill University
- Nicolas P. Rougier, INRIA and ReScience
- Steve Hsu, Michigan State University
- Armand Joulin, Facebook
- Alexandre Gramfort, Scikit-Learn
- Percy Liang, Stanford University
- Alistair Johnson, MIT
- Aki Vehtari, Aalto University
- Olivia Guest, University College London
- Animashree Anandkumar, California Institute of Technology and Amazon AI
** This list is subject to change depending on speakers/panelists final availability.
- Rosemary Nan Ke, (MILA) École Polytechnique de Montréal
- Peter Henderson, (MILA) McGill University
- Alex Lamb, (MILA) Université de Montréal
- Anirudh Goyal, (MILA) Université de Montréal
- Nikolaos Vasiloglou, MLTrain
- Aaron Courville, (MILA) Université de Montréal
- Chris Pal, (MILA) École Polytechnique de Montréal
- Hugo Larochelle, Google Brain Montréal
- Orial Vinyals, Google Deepmind
- Yoshua Bengio, (MILA) Université de Montréal
- Alex Dimakis, University of Texas at Austin
- Animashree Anandkumar, California Institute of Technology and Amazon AI
- Kristian Kersting, TU Darmstadt University
- George Georgoulas, Luleå University of Technology
- Hongyang Li, The Chinese University of Hong Kong
Readings and References
 Lucic, Mario, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. "Are GANs Created Equal? A Large-Scale Study." arXiv preprint arXiv:1711.10337 (2017).
 Melis, Gábor, Chris Dyer, and Phil Blunsom. "On the state of the art of evaluation in neural language models." arXiv preprint arXiv:1707.05589 (2017).
 Henderson, Peter, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. "Deep reinforcement learning that matters." arXiv preprint arXiv:1709.06560 (2017).
 Nie, Xinkun, Xiaoying Tian, Jonathan Taylor, and James Zou. "Why adaptively collected data have negative bias and how to correct for it." (2017).