ParCo2019 Symposium

Tools and Infrastructure for Reproducibility in Data- Intensive Applications

September 10, 2019 in Prague

Meaningful advances in science and engineering are increasingly predicated on data-driven decision making. For these decisions to be valid, it is essential that the one not only record the process by which results were produced, but be able to reproduce the data involved at every step in the process. While we are all used to tracking source code revisions, and keeping track of program inputs and outputs, the increased complexity of end-to-end computing pipelines, coupled with new big-data and machine learning algorithms, imposes significant complexity on tracking all of the steps and associated data that went into producing a result. For example, keeping track of exactly what data use used for a training set vs. an evaluation set, what cleaning was done, what analysis was done on the results to evaluate performance, and what additional experiments were performed. With the ever-increasing number, size, and complexity of the data used in data-intensive applications, reproducing results from these types of investigations becomes increasingly difficult. While no-one deliberately sets out to create un-reproducible results, recent surveys of the literature shows that the ability to reproduce data-intensive results are the exception and not the rule.

For these reasons, a symposium on issues, tools and infrastructure for data intensive applications is highly germane to the ParCo community.

In this symposium, we propose to review current state of the art in reproducibility in data-intensive computing applications.

We will cover three primary topic areas:

Reproducibility challenges that are specific to science and engineering activities that have data-intensive computing as a core aspect of the process
Infrastructure, tools and methods that are currently available for reproducible data-intensive applications, and gaps and challenges that need to be addressed.
How to increase the adoption of methods for reproducible data-intensive applications across the research community.

Dates:

Submission of extended abstracts/draft papers: 31 July 2019
Notification of acceptance for presentation at the Symposium: 10 August 2019
Submission of full papers: 10 September 2019
Symposium: 10 September, 2019
Notification of acceptance of full papers for publication: 30 September 2019
Deadline for submission of full papers for proceedings: 31 October 2019

Submission guidelines:

Extended abstracts should be ca. 2 pages. Full papers are allowed the same number of pages as all other papers included in the proceedings, i.e. 10 pages.
Contributions should be submitted by e-mail to the Symposium organisers at: sandro [dot] fiore [at] cmcc [dot] it , foster [at] uchicago [dot] edu , carl [at] isi [dot] edu
Please specify in the subject of your email: [Symposium contribution]

Organizers:

Sandro Fiore Euro-Mediterranean Center on Climate Change
Ian Foster University of Chicago and Argonne National Laboratory
Carl Kesselman University of Southern California

Useful Information about ParCo2019

Location and dates: Prague, Czech Republic, 10-13 September 2019

Conference website: https://www.parco.org/

Symposium page: https://www.parco.org/symposia.html

Report abuse