The W3C Provenance Group

Shared Provenance Representations: The W3C Provenance Group



Yolanda Gil

Information Sciences Institute, University of Southern California



December 9, 2010



The reproducibility of scientific results can only be possible if there is a detailed record of their provenance, that is, of how the results were obtained.   Research on provenance spans not only workflows but also database research, library science, law and policy, and semantic web.  Provenance representations have been proposed from a variety of perspectives.   These include the Open Provenance Model[1], the Dublin Core Metadata[2], the Provenance Vocabulary[3], and the Provenance Authoring and Versioning Ontology[4] among others. 

While many individual systems are being developed to capture various aspects of provenance, they will have a limited impact on science unless there are agreed upon standard mechanisms for representation and exchange of provenance.  Different systems cover different aspects of provenance, and this raises the need for a better understanding of the landscape that needs to be addressed by provenance frameworks in the context of science.

In September of 2009, the W3C Provenance Group[5] was formed to develop a roadmap in the area of provenance in terms of requirements in various web contexts, current state of the art, and possible standardization recommendations.  The group includes not only scientists but also other constituents of the web community interested in the area of provenance of web objects and resources.  The group developed more than 30 use cases, many concerned with scientific provenance, and hundreds of detailed requirements to support provenance.  The group also created a report on the state of the art that covers the work being done across many areas.

Of particular interest may be the use cases concerning reproducibility, and the synthesized “Disease Outbreak” scenario focused on a disease outbreak that involves provenance in scientific research activities across disciplines [Cheney et al 10].

The group also created mappings across existing provenance vocabularies [Sahoo et al 10], using the Open Provenance Model as a reference.

The final report of the group was recently released [Gil et al 10].  The group drafted a proposed charter for a Provenance Interchange Working Group, with a set of 17 core provenance concepts as the basis for a standard representation of provenance and with a concrete set of deliverables to be accomplished over 2 years. 

Many documents and provenance for the report can be found at the group’s site[6].






[Cheney et al 10] “Requirements for Provenance on the Web.” James Cheney, Yolanda Gil, Paul Groth (Editor), and Simon Miles.  Report from the W3C Provenance Incubator Group, first release: April 9, 2010.  Available from

[Gil et al 10] “Final Report of the W3C Provenance Incubator Group.” Yolanda Gil, James Cheney, Paul Groth, Olaf Hartig, Simon Miles, Luc Moreau, and Paolo Pinheiro da Silva. Report from the W3C Provenance Incubator Group, first release: November 30, 2010.  Available from

[Sahoo et al 10] “Provenance Vocabulary Mappings.” Satya Sahoo, Paul Groth, Olaf Hartig, Simon Miles, Sam Coppens, James Myers, Yolanda Gil, Luc Moreau, Jun Zhao, Michael Panzer, and Daniel Garijo.  Report from the W3C Provenance Incubator Group, first release: August 6, 2010.  Available from