Due 11:59 pm Sun May 7
In this project stage you will do some data analysis on the integrated and cleaned table, to infer insights. This analysis is something of your own choosing. But it must involve one of the key techniques that we will cover in the class: classification, clustering, correlation discovery, anomaly detection, or OLAP-style exploration. I will discuss more in the class.
What to submit
Submit the following on your group's website:
a CSV file storing Table E, the integrated table which is the output of project stage 4.
a pdf file that discusses the following issues:
Statistics on Table E: specifically, what is the schema of Table E, how many tuples are in Table E? Give at least four sample tuples from Table E.
What was the data analysis task that you wanted to do? (Example: we wanted to know if we can use the rest of the attributes to accurately predict the value of the attribute loan_repaid.) For that task, describe in detail the data analysis process that you went through.
Give any accuracy numbers that you have obtained (such as precision and recall for your classification scheme).
What did you learn/conclude from your data analysis? Were there any problems with the analysis process and with the data?
If you have more time, what would you propose you can do next?