Questions/Assignments:
What the data is about.
What type of benefit you might hope to get from data mining.
What type of data mining (classification, clustering, etc.) you think would be relevant.
For each, illustrate with an example, e.g., if you think clustering is relevant, describe what you think a likely cluster might contain and what the real-world meaning would be.
Name one type of data mining that you think would not be relevant, and describe briefly why not.
Discuss data quality issues: For each attribute,
Are there problems with the data?
What might be an appropriate response to the quality issues.
For at least two attributes, discuss data preprocessing, and give an example of how it would be done / the outcome on a small subset of the data.
What would an appropriate smoothing or generalization technique be?
What is an appropriate normalization or data reduction technique?