Problems and techniques for Incremental Re-computation:

provenance and beyond

A workshop co-organized with Provenance Week 2018

King's College London, 12th and 13th July, 2018

Organizers: Paolo Missier (Newcastle University), Tanu Malik (DePaul University), Jacek Cala (Newcastle University)

Program with Participants contributions

Keynotes:

Bio: Frank McSherry received his PhD from the University of Washington, working with Anna Karlin on spectral analysis of data. He then spent twelve years at Microsoft Research's Silicon Valley research center, working on topics ranging from differential privacy to data-parallel computation. He currently works at ETH Zürich's Systems Group on scalable stream processing and related topics.

Why Incremental Re-computation? What's the connection with provenance?

This one-day workshop aims to bring together researchers and practitioners from multiple communities around the general problem of incremental re-computation of knowledge outcomes that are produced using data-intensive and computationally expensive processes (workflows, simulations, training of predictive models). Incremental recomputation is recomputing in response to changes in the elements that contributed to the original computation, i.e., inputs, reference datasets, tools, libraries, and deployment environment. This need for incremental recomputation and its optimizations across multiple computations is fundamental to current trends in data science, and big data, and machine learning, where online learning approaches can sometimes be employed to achieve incremental model re-training.

The provenance community has developed sound and robust approaches for the tracking, storing, and querying of lineage data. As increasing number of applications become provenance-enabled, there is an opportunity to explore how to use collected provenance for improving recomputation of applications. More broadly, we aim to understand how provenance combines with other complementary techniques into an effective approach to addressing the re-computation problem.

Specific research topics for discussion

Aims:

  • To solicit and discuss case studies from a variety of areas in science and engineering that will benefit from incremental re-computation;
  • To discuss computational approaches that improve and go beyond provenance analytics; and
  • To lay out a research agenda and initiate new collaborations aimed at attracting additional joint funding from the Research Councils (UK, EU, NSF, etc.).

Format:

One day “Dagstuhl-style” workshop with invited short contributions (abstract and a short talk).

  • Morning: One keynote talk at the start, then scene-setting presentations from participants.
  • Afternoon: workshop-style topical discussions and research agenda setting, aimed at producing the backbone of a report.

Organisers: Paolo Missier (Newcastle University, UK), Tanu Malik (DePaul University, USA), Jacek Cala (Newcastle University, UK)

Expected Outcomes:

a joint report (“Dagstuhl style”) on state of the art of the selective re-computation problem, applicable techniques, and open challenges