
The UN has the National Quality Assurance Frameworks (November 2015) which says "While there are several general definitions of quality, one of the most commonly used and succinct definitions is fitness for use or fitness for purpose. ... the concept of quality of statistical information is multi-dimensional and that there is no one single measure of data quality. Examples of the common quality dimensions or components include: relevance; accuracy; reliability; timeliness; punctuality; accessibility; clarity, interpretability; coherence; comparability; credibility; integrity; methodological soundness; and serviceability." This UN site also lists national and international quality references.

UN Statistical Commission

Eurostat's quality page has this 2017 Code of Practice which includes a section on statistical output which says "Output quality is measured by the extent to which the statistics are relevant, accurate and reliable, timely, coherent, comparable across regions and countries, and readily accessible by users, i.e. the Principles of Statistical Output."

Accurate and reliable are this:

Indicator 12.1: Source data, integrated data, intermediate results and statistical outputs are regularly assessed and validated.

Indicator 12.2: Sampling errors and non-sampling errors are measured and systematically documented

Data Quality Review Toolkit

The DQR toolkit includes guidelines and tools that lay the basis for a standardized and holistic approach to data quality that promotes institutionalization of routine data quality assessment in countries. Developed in cooperation with many organizations, including WHO. 2017.

Statistics Canada has this Policy on Informing Users of Data Quality and Methodology (approved March 31, 2000), which lists the six characteristics of data fit for use: relevance, accuracy, timeliness, accessibility, interpretability and coherence.

Hong Chen, David Hailey, Ning Wang, and Ping Yu. A Review of Data Quality Assessment Methods for Public Health Information Systems Int J Environ Res Public Health. 2014 May; 11(5): 5170–5207. Among the findings are that "Completeness, accuracy, and timeliness were the three most-used attributes among a total of 49 attributes of data quality."

Data Quality – Guidance for providers and commissioners

indicators of data quality

7 Sources of Poor Data Quality By William McKnight, partner, Information Management, Lucidity Consulting Group 2009. Interesting article that reviews where data errors are likely to come from.

IQ International is an association about information and data quality. Click on "knowledge" then on fundamentals of IQ for some basic articles about info and data quality.

Validating / checking data

Methodology for Data Validation 1.0 (Handbook), revised edition June 2016

In this handbook, "a definition for data validation is provided, the main purpose of data validation is discussed taking into account the European quality framework, and finally, for the ‘how’ perspective, the key elements necessary for performing data validation, that are validation rules, are illustrated. "

National Emergency Medical Service for Children Data Analysis Resource Center This resource has a couple of chapters on validating and cleaning data sets.

ACAPS has a couple of documents about data cleaning.

"Data Cleaning"

listed here


"How to approach a dataset. Part 1: Database design." Includes a section on data cleaning

Unesco has this data validation page Much about how to use SPSS to check data. However, the things they say to check for apply to all data.

World Bank has an evaluation wiki site, with one page about data cleaning what to watch for, things like duplicates, outliers, illogical values, typos.

Detecting Data Errors: Where are we and what needs to be done? Ziawasch Abedjan et al, Proceedings of the VLDB Endowment, Vol. 9, No. 12, 2016

This paper outlines types of data errors and software tools for identifying errors. The types of errors are

1. Outliers

2. Duplicates

3. Rule violations

4. Pattern violations