Uncertainty Quantification and Provenance in Computational Materials

We start the discussion using the framework of Davidson and Freire[1] which identifies issues of provenance and scientific workflows in computational materials.

provide a general overview of scientific workflows,
describe research on provenance for scientific work- flows and show in detail how provenance is supported in existing systems,
discuss emerging applications that are enabled by provenance, and
outline open problems and new directions for database-related research.

Provenance: The provenance (also referred to as the audit trail, lineage, and pedigree) of a data product contains information about the process and data used to derive the data product. It provides important documentation that is key to preserving the data, to determining the data’s quality and authorship, and to reproduce as well as validate the results.[1]

Workflow Systems for Science: [25, 41, 31, 42, 43, 45, 16, 17, 27, 38]

The case study of pymatgen

Workflow and workflow-based systems have emerged as an alternative to ad-hoc approaches for constructing computational scientific experiments [25, 39, 41, 45, 31]. Workflow systems help scientists conceptualize and manage the analysis process, support scientists by allowing the creation and reuse of analysis tasks, aid in the discovery process by managing the data used and generated at each step, and (more recently) systematically record provenance information for later use. Workflows are rapidly replacing primi- tive shell scripts as evidenced by the release of Apple’s Mac OS X Automator, Microsoft’s Workflow Foundation, and Yahoo! Pipes.

Software Tools for Writing Reproducible Papers

References:

[1] Davidson, Susan B., and Juliana Freire. "Provenance and scientific workflows: challenges and opportunities." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.