Dataset Quality Ontology: An engineering experience.

Date: Feburary 5, 2016

Speaker: Jeremy Debattista

Abstract

Data quality is commonly defined as fitness for use. Many data consumers face the problem of identifying the quality of data. Data publishers, on the other hand, often do not have the means to identify quality issues in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ) [1]. daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube Vocabulary. Quality metadata are organised as a self-contained graph, which can be embedded into linked datasets to support quality-based retrieval and ranking. During this talk the discussion will include design issues behind the daQ vocabulary and how it helped evolving the upcoming W3C Data Quality Vocabulary initiative [2], and some ontology quality issues related to ontologies and vocabularies.

References

Debattista, J., Lange, C., & Auer, S. (2014). Representing dataset quality metadata using multi-dimensional views. Proceedings of the 10th International Conference on Semantic Systems, 92-99.
https://www.w3.org/TR/vocab-dqv/

Google Sites

Report abuse