Towards Provenance-Rich Publications

Computing has been an enormous accelerator to science and it has led to an information explosion in many different fields. The unprecedented volume of data acquired by sensors, derived by simulations and analysis processes, and shared on the Web opens up new opportunities, but it also creates many challenges when it comes to managing and making sense out of these data.  In this talk, I discuss the importance of maintaining detailed provenance (also referred to as lineage and pedigree) for digital data. Provenance provides important documentation that is key to preserve data, to determine the data's quality and authorship, to understand, reproduce, as well as validate results. I will give a demonstration of VisTrails, a provenance-enabled system that supports data exploration and visualization, and show how it simplifies the process of creating provenance-rich, reproducible publications. I will also discuss challenges and open problems in building infrastructure to support reproducible publications and our early experiences in supporting the ACM SIGMOD Repeatability experiment (