RECUP: ScaIable Metadata and Provenance for Reproducible Hybrid Workflows

We develop methods to enable the reproducibility of performance and scientific results for numerical and data-intensive simulations orchestrated by workflows that run on high performance systems

Application performance data and metadata are extracted at runtime to enable unified, FAIR-enabled metadata format that captures task details (dependencies, execution order, performance metrics, inputs and outputs, etc.).  Darshan will be employed for workflow I/O instrumentation and Mochi for high-volume data aggregationRadical manages resource allocation for HPC applications. Metadata and pointers to data are saved into a RO-Crate profile.  Very low overhead checkpointing (VeLOC) captures application execution with fast lineage comparison based on scalable hashing towards a reproducibility framework.