CODA 2025 keynote
February 25, 2025, Santa Fe, NM
The Role of FAIR in Data-intensive, Reproducible Workflows
Findable, Accessible, Interoperable, Re-usable (FAIR) principles have been explored, applied, and implemented for over a decade, following the coalescing of ideas and the development of tools around data, software, workflows, and now machine learning. The High Performance Computing community has started to adopt these principles but the specificity and complexity of computing architectures, programming models, and modes of execution have rendered the adoption of FAIR principles and existing tools less general. Reproducibility, a fundamental tenet of the scientific method, benefits from the implementation of FAIR principles in experiments, when data and software are published with the goal of reproducing results. To improve reproducibility, one must measure it and characterize it at scale. In HPC workflows, providing the ability to reproduce results may entail extracting, curating and reconciling metadata, comparing execution patterns, and measuring result variability. Experiments that include machine learning models add requirements to make data and models FAIR; adhering to FAIR is necessary but not sufficient to ensure trust in ML results. This talk will discuss FAIR and its limitations for reproducibility in high performance and data-intensive computing. We will present some solutions and research directions developed in RECUP, an ASCR-sponsored project, for scalable, reproducible workflows.
Acknowledgements:
Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy National Nuclear Security Administration under contract DE-NA0003525. (SAND2025-00644C).