DataQuality

The UN has the National Quality Assurance Frameworks (November 2015)

http://unstats.un.org/unsd/dnss/qualityNQAF/nqaf.aspx which says "While there are several general definitions of quality, one of the most commonly used and succinct definitions is fitness for use or fitness for purpose. ... the concept of quality of statistical information is multi-dimensional and that there is no one single measure of data quality. Examples of the common quality dimensions or components include: relevance; accuracy; reliability; timeliness; punctuality; accessibility; clarity, interpretability; coherence; comparability; credibility; integrity; methodological soundness; and serviceability." This UN site also lists national and international quality references.

UN Statistical Commission https://unstats.un.org/home/

Eurostat's quality page http://ec.europa.eu/eurostat/web/quality has this 2017 Code of Practice http://ec.europa.eu/eurostat/documents/64157/4392716/Revised_CoP_Nov_2017.pdf which includes a section on statistical output which says "Output quality is measured by the extent to which the statistics are relevant, accurate and reliable, timely, coherent, comparable across regions and countries, and readily accessible by users, i.e. the Principles of Statistical Output."

Accurate and reliable are this:

Indicator 12.1: Source data, integrated data, intermediate results and statistical outputs are regularly assessed and validated.

Indicator 12.2: Sampling errors and non-sampling errors are measured and systematically documented

Data Quality Review Toolkit https://www.measureevaluation.org/our-work/data-quality/data-quality-review

The DQR toolkit includes guidelines and tools that lay the basis for a standardized and holistic approach to data quality that promotes institutionalization of routine data quality assessment in countries. Developed in cooperation with many organizations, including WHO. 2017.

Statistics Canada has this http://www.statcan.gc.ca/eng/about/policy/info-user Policy on Informing Users of Data Quality and Methodology (approved March 31, 2000), which lists the six characteristics of data fit for use: relevance, accuracy, timeliness, accessibility, interpretability and coherence.

Hong Chen, David Hailey, Ning Wang, and Ping Yu. A Review of Data Quality Assessment Methods for Public Health Information Systems http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053886/ Int J Environ Res Public Health. 2014 May; 11(5): 5170–5207. Among the findings are that "Completeness, accuracy, and timeliness were the three most-used attributes among a total of 49 attributes of data quality."

Data Quality – Guidance for providers and commissioners

https://www.england.nhs.uk/publication/data-quality-guidance-for-providers-and-commissioners/

indicators of data quality

7 Sources of Poor Data Quality By William McKnight, partner, Information Management, Lucidity Consulting Group https://www.melissadata.com/enews/articles/0611/2.htm 2009. Interesting article that reviews where data errors are likely to come from.

IQ International http://iaidq.org/ is an association about information and data quality. Click on "knowledge" then on fundamentals of IQ for some basic articles about info and data quality.

Validating / checking data

Methodology for Data Validation 1.0 (Handbook), revised edition June 2016

https://ec.europa.eu/eurostat/cros/search/custom-taxonomy/knowledge-repository-general-innovation-area/handbooks

In this handbook, "a definition for data validation is provided, the main purpose of data validation is discussed taking into account the European quality framework, and finally, for the ‘how’ perspective, the key elements necessary for performing data validation, that are validation rules, are illustrated. "

National Emergency Medical Service for Children Data Analysis Resource Center http://www.nedarc.org/tutorials/analyzingData/index.html This resource has a couple of chapters on validating and cleaning data sets.

ACAPS has a couple of documents about data cleaning.

"Data Cleaning"

https://www.acaps.org/sites/acaps/files/resources/files/acaps_technical_brief_data_cleaning_april_2016_0.pdf

listed here

https://www.acaps.org/library/assessment

and

"How to approach a dataset. Part 1: Database design." Includes a section on data cleaning

https://www.acaps.org/sites/acaps/files/resources/files/how_to_approach_a_dataset-part_1_database_design_august_2013.pdf

Unesco has this data validation page http://www5.unescobkk.org/education/efatraining/module-b3/3-data-validation/ Much about how to use SPSS to check data. However, the things they say to check for apply to all data.

World Bank has an evaluation wiki site, with one page about data cleaning

https://dimewiki.worldbank.org/wiki/Data_Cleaning what to watch for, things like duplicates, outliers, illogical values, typos.

Detecting Data Errors: Where are we and what needs to be done? Ziawasch Abedjan et al, Proceedings of the VLDB Endowment, Vol. 9, No. 12, 2016 http://www.vldb.org/pvldb/vol9/p993-abedjan.pdf

This paper outlines types of data errors and software tools for identifying errors. The types of errors are

1. Outliers

2. Duplicates

3. Rule violations

4. Pattern violations