Tools and Infrastructure for Reproducibility in Data- Intensive Applications
September 10, 2019 in Prague
Meaningful advances in science and engineering are increasingly predicated on data-driven decision making. For these decisions to be valid, it is essential that the one not only record the process by which results were produced, but be able to reproduce the data involved at every step in the process. While we are all used to tracking source code revisions, and keeping track of program inputs and outputs, the increased complexity of end-to-end computing pipelines, coupled with new big-data and machine learning algorithms, imposes significant complexity on tracking all of the steps and associated data that went into producing a result. For example, keeping track of exactly what data use used for a training set vs. an evaluation set, what cleaning was done, what analysis was done on the results to evaluate performance, and what additional experiments were performed. With the ever-increasing number, size, and complexity of the data used in data-intensive applications, reproducing results from these types of investigations becomes increasingly difficult. While no-one deliberately sets out to create un-reproducible results, recent surveys of the literature shows that the ability to reproduce data-intensive results are the exception and not the rule.
For these reasons, a symposium on issues, tools and infrastructure for data intensive applications is highly germane to the ParCo community.
In this symposium, we propose to review current state of the art in reproducibility in data-intensive computing applications.
We will cover three primary topic areas:
- Reproducibility challenges that are specific to science and engineering activities that have data-intensive computing as a core aspect of the process
- Infrastructure, tools and methods that are currently available for reproducible data-intensive applications, and gaps and challenges that need to be addressed.
- How to increase the adoption of methods for reproducible data-intensive applications across the research community.
- Submission of extended abstracts/draft papers: 31 July 2019
- Notification of acceptance for presentation at the Symposium: 10 August 2019
- Submission of full papers: 10 September 2019
- Symposium: 10 September, 2019
- Notification of acceptance of full papers for publication: 30 September 2019
- Deadline for submission of full papers for proceedings: 31 October 2019
- Extended abstracts should be ca. 2 pages. Full papers are allowed the same number of pages as all other papers included in the proceedings, i.e. 10 pages.
- Contributions should be submitted by e-mail to the Symposium organisers at: sandro [dot] fiore [at] cmcc [dot] it , foster [at] uchicago [dot] edu , carl [at] isi [dot] edu
- Please specify in the subject of your email: [Symposium contribution]
- Sandro Fiore Euro-Mediterranean Center on Climate Change
- Ian Foster University of Chicago and Argonne National Laboratory
- Carl Kesselman University of Southern California
Useful Information about ParCo2019
Location and dates: Prague, Czech Republic, 10-13 September 2019
Conference website: https://www.parco.org/
Symposium page: https://www.parco.org/symposia.html