OUTCOME: Students will be able to filter and sort a dataset using a spreadsheet tool, identify and correct invalid values in a dataset with the aid of computational tools and justify the need to clean data prior to analyzing it with computational tools.
Explore
Have a look at these data quality issues. Assume they are part of a larger data set.
Identify the issue and suggest a fix.
Create
•Create a google sheet
•Create a data set with at least 10 entries with at least 5 attributes (columns)
•Input some data quality issues that will need fixing
•Share data sets with a peer and fix each other data sets. There is some guidance here and also a video here
OR
•Use this data set
•Make a copy and clean the data
Mastering
Using python try and find and fix some errors in this data set using this notebook.
The data is in a csv and the jupyter notebook need to be in the same folder in Finder.
Watch this video here
Comprehensive Guide - Further reading
See here for further reading