RECUP: ScaIable Metadata and Provenance for Reproducible Hybrid Workflows

We develop methods to enable the reproducibility of performance and scientific results for numerical and data-intensive simulations orchestrated by workflows that run on high performance systems

What (meta)data is relevant for reproducibility?
How can we capture, curate, fuse, and index the relevant (meta)data with minimal overhead at scale?
How can we compare two repeated runs both in terms of (meta)data and intermediate results to study different types of reproducibility?
How can we identify the root causes of runs that are not reproducible, both from the perspective of performance and results?

Page updated

Google Sites

Report abuse