ProvBench @ BigProv'13


The goal of this first ProvBench event is to start the collection provenance traces that are publicly accessible and can be used for benchmarking experiments in the near future. Definition of benchmarking metrics and criteria for a benchmark suite will emerge in the future series.

The submission will be shared on the ProvBench web site along with some descriptive metrics and
will be the basis of the subsequent steps of the benchmarking exercise (query, analysis etc.).

The ProvBench corpus collected in the first edition of the event is accessible on github here.


The first edition of the ProvBench series will be co-located with BigProv'13, the International Workshop on Managing and Querying Provenance Data at Scale.


Submission will take place at two stages:
  1. Submission of expression of Interest
  2. Submission of extended abstract and provenance traces

Submission of expression of interest (expired)

By November 15, 2012 please email provbench-admin at googlegroups dot com for your expression of interest and request for your access to the Github repository (for submitting provenance traces and abstract).

Submission of extended abstract and provenance traces

All submissions to the BigProv'13 workshop are strongly encouraged to submit their provenance traces to ProvBench. Submissions can also opt to make their traces available in an alternative public online space, but they must be open accessible. All submissions without open accessible provenance traces will be rejected.

We encourage the use of the W3C Prov model, but authors can use any provenance model of their choice.

The extended abstract should be formatted using the ACM Proceedings format and will be reviewed by the ProvBench organisation and steering committee. Accepted submissions will be included in the official EDBT workshop proceedings, along with the proceedings of the BigProv'13. Each submission should be less than 2 pages and should contain the following sections:
  • Summary of submission: including information about the provenance traces generated in the format of the following table (see also the readme.txt, attached on this page):
  •  Data format (XML, RDF, relational, or others)
     Data model (PROV or others)
     Size (e.g bytes, no. of triples) (if applicable)
     Tools used for generating provenance
     Application domain (such as biology, the web)
     Submission group

  • Experience statement: summarising the experience of the authors in generating such traces,
  • Application: describing the applications that the authors anticipate will benefit from such traces,
  • Possible provenance queries: listing in English the possible provenance queries that can be executed against their provenance information,
  • Coverage of PROV starting-point concepts/properties: describing their coverage of the following PROV-O concepts and properties in the format of the following table, and highlighting any PROV terms that are not listed below

Each submission will be allocated a team folder in the Github repository:

  • Provenance traces can be submitted in the form of a folder or a zipped file to the team folder, along with the extended abstract.
  • If provenance traces are made open accessible in other public space, only the extended abstract needs to be submitted to the Github repository.
We provide a template readme.txt file (attached on the page) that can be used to specify license and ownership of the traces, as well as information about the authors, the date in which each provenance trace was produced/submitted, the tool(s) used for their generation, the link to the provenance traces (if not in the Github repository), and among other things.

Target Audience

For the first series of ProvBench we would like to open submissions to a broad range of audience who would be interested in creating a first-ever community-based provenance corpus for benchmarking and for bootstrapping provenance-based research and applications. Anyone is invited to contribute, be a content provider or a technologist. A content provider can be the owner of a web site or a blog, scientists or data providers of a dataset of any structured format, being relational, RDF, XML, RDFa, tab-delimited or others. A technologist can be a developer of a provenance publication application or plug-in, an administrator of a web site, or computer scientists who are interested in provenance-related research or applications, or just generally enthusiastic about provenance.


Each submission group can give a 5-10 minutes talk at the ProvBench session at BigProv'13. Teams who cannot make the trip can be represented by the organisers. A summary of the submission will also be presented in the BigProv workshop.

The provenance traces will be advertised in the ProvBench website as well as in other communication channels.

Organisation Committee

  • Khalid Belhajjame, University of Manchester, UK
  • Jun Zhao, University of Oxford, UK
  • Jose Manuel Gomez-Perez, iSOCO, Spain
  • Satya Sahoo, Case Western Reserve University, USA

Steering Committee

  • Paolo Missier, Newcastle University, UK
  • Bertram Ludaescher, UC Davis, CA

Jun Zhao,
Nov 14, 2012, 7:25 AM
Jun Zhao,
Oct 17, 2012, 1:54 AM