We start the discussion using the framework of Davidson and Freire[1] which identifies issues of provenance and scientific workflows in computational materials.
Provenance: The provenance (also referred to as the audit trail, lineage, and pedigree) of a data product contains information about the process and data used to derive the data product. It provides important documentation that is key to preserving the data, to determining the data’s quality and authorship, and to reproduce as well as validate the results.[1]
Workflow Systems for Science: [25, 41, 31, 42, 43, 45, 16, 17, 27, 38]
The case study of pymatgen
Workflow and workflow-based systems have emerged as an alternative to ad-hoc approaches for constructing computational scientific experiments [25, 39, 41, 45, 31]. Workflow systems help scientists conceptualize and manage the analysis process, support scientists by allowing the creation and reuse of analysis tasks, aid in the discovery process by managing the data used and generated at each step, and (more recently) systematically record provenance information for later use. Workflows are rapidly replacing primi- tive shell scripts as evidenced by the release of Apple’s Mac OS X Automator, Microsoft’s Workflow Foundation, and Yahoo! Pipes.
Software Tools for Writing Reproducible Papers
[1] Davidson, Susan B., and Juliana Freire. "Provenance and scientific workflows: challenges and opportunities." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.