The goal of this first ProvBench event is to start the collection provenance traces that are publicly accessible and can be used for benchmarking experiments in the near future. Definition of benchmarking metrics and criteria for a benchmark suite will emerge in the future series.
The submission will be shared on the ProvBench web site along with some descriptive metrics and will be the basis of the subsequent steps of the benchmarking
exercise (query, analysis etc.).
The ProvBench corpus collected in the first edition of the event is accessible on github here.
The first edition of the ProvBench series will be co-located with BigProv'13, the International Workshop on Managing and Querying Provenance Data at Scale.
Submission will take place at two stages:
- Submission of expression of Interest
- Submission of extended abstract and provenance traces
Submission of expression of interest (expired)By
November 15, 2012 please email provbench-admin at googlegroups dot com for
your expression of interest and request for your
access to the Github repository (for submitting provenance traces and abstract).
Submission of extended abstract and provenance traces
All submissions to the BigProv'13 workshop are strongly encouraged to submit their provenance traces to ProvBench. Submissions can also opt to make their traces available in an alternative public online space, but they must be open accessible. All submissions without open accessible provenance traces will be rejected.
We encourage the use of the W3C Prov model, but authors can use any provenance model of their choice.
The extended abstract should be formatted using the ACM Proceedings format and will be reviewed by the ProvBench organisation and steering committee. Accepted submissions will be included in the official EDBT workshop proceedings, along with the proceedings of the BigProv'13. Each submission should be less than 2 pages and should contain the following sections:
- Summary of submission: including information about the provenance traces generated in the format of the following table (see also the readme.txt, attached on this page):
| Data format (XML, RDF, relational, or others)|| |
| Data model (PROV or others)|| |
| Size (e.g bytes, no. of triples) (if applicable)|| |
| Tools used for generating provenance|| |
| Application domain (such as biology, the web)|| |
| Submission group|| |
| Contact|| |
| License|| |
| Others|| |
- Experience statement: summarising the experience of the authors in generating such traces,
- Application: describing the applications that the authors anticipate will benefit from such traces,
- Possible provenance queries: listing in English the possible provenance queries that can be executed against their provenance information,
- Coverage of PROV starting-point concepts/properties: describing their coverage of the following PROV-O concepts and properties in the format of the following table, and highlighting any PROV terms that are not listed below
Each submission will be allocated a team folder in the Github repository: https://github.com/provbench/:
We provide a template readme.txt
file (attached on the page) that can be used to specify license and
ownership of the traces, as well as information about the authors, the
date in which each provenance trace was produced/submitted, the tool(s)
used for their generation, the link to the provenance traces (if not in the Github repository), and among other things.
- Provenance traces can be submitted in the form of a folder or a zipped file to the team folder, along with the extended abstract.
- If provenance traces are made open accessible in other public space, only the extended abstract needs to be submitted to the Github repository.
For the first series of ProvBench we would like to open submissions to a broad range of audience who would be interested in creating a first-ever community-based provenance corpus for
benchmarking and for bootstrapping provenance-based research and applications. Anyone is invited to contribute, be a
content provider or a technologist. A content provider can be the owner
of a web site or a blog, scientists or data providers of a dataset of
any structured format, being relational, RDF, XML, RDFa, tab-delimited or others. A technologist can be a developer of a provenance
publication application or plug-in, an administrator of a web site, or
computer scientists who are interested in provenance-related research or
applications, or just generally enthusiastic about provenance.
Each submission group can give a 5-10 minutes talk at the ProvBench session at BigProv'13. Teams who cannot make the trip can be represented by the organisers. A summary of the submission will also be presented in the BigProv workshop.
The provenance traces will be advertised in the ProvBench website as well as in other communication channels.
- Khalid Belhajjame, University of Manchester, UK
- Jun Zhao, University of Oxford, UK
- Jose Manuel Gomez-Perez, iSOCO, Spain
- Satya Sahoo, Case Western Reserve University, USA
- Paolo Missier, Newcastle University, UK
- Bertram Ludaescher, UC Davis, CA