Lesson - Cleaning data

Learning Intention: understand the need to 'clean' data before using it




1) Look at the data sheet your group has been given. Identify the data you would remove. Explain why you would remove these data. Record this in a table:

Record Number



Date of Birth


Any one born in 1894 would be alive today

• Remember these data were entered in 2008/2009.

• Do the ages and dates of birth make sense?

• Have any children got feet bigger than their height?

• Are some children too tall for their age?

2) Suggest changes you would make to the data to make it more accurate. Learners were asked ‘What is your natural hair colour?’ and ‘Which soap location would you prefer to live in?’

• Can you fit the ‘Other Hair Colour’ entries into the ‘Hair Colour’ categories, e.g. could auburn be considered light brown?

• Can you do a similar thing for the ‘Other Soap Locations’?

NB Each question had drop down response menus with choices ‘dark brown’, ‘light brown’, ‘black’, ‘blonde’, ‘red’ and ‘other’ for hair colour and ‘Coronation Street’, ‘Albert Square’, ‘Summer Bay’, ‘Ramsey Street’, ‘Emmerdale’, ‘other’ and ‘none’ for soap locations.


1) Refer to Data from Census at school and repeat the same data cleaning (Record/Field/Reason)

2) Refer to Data from Random Sample - 30 Y10 Boys and repeat the same data cleaning (Record/Field/Reason)

3) Why is it important to clean data before we begin to analyse it?

Diagnostic answers: