Important links:
References:
Datasets:
Frequently Asked Questions
As the class project goes on, I will use this space to post questions asked to me which I think may be relevant for the rest of the class.
Question 1. Can I work with other students in the class?
Yes, you can work with up to one other student, however you need to submit your own report and both people have to state the contributions from each person. You can share the same classification model, code, and the same predictions with another student, but your reports should be written independently.
Question 2. Can I seek advice from friends or colleagues?
Yes, you can seek advice from anyone, as long as you submit your own report and either you or your optional partner write the code to visualize the data and build the prediction models.
Question 3. How will I be evaluated for the project?
If you explain and understand each step of the data mining process, understand and visualize the data effectively, are able to build a clustering, regression, & classification model from the data (i.e. complete all four objectives) you can expect to get full credit. Partial credit will be given as well.
Question 4. Can we submit more than one model?
Yes! You can submit up to 10 models, but you need to explain how and why you built each one.
Question 5. By which metric will the model be evaluated? How good does it have to be to get full credit?
The evaluation metrics are detailed in the write-up and the Kaggle leaderboards. Briefly, they are the mean squared error (for Part 3) and the negative log likelihood (for Part 4). I haven't set a quantitative bar on how good the model has to be to get full credit. A good report, with good visualizations and a good thought process will get full credit even for a model that may not work very well.
Question 6. Will you be teaching most of the tools required to do the class project, or will most of it be self-learning? For example, will we be learning how to build our own classifier model and train it with data sets, or is that something for us to figure out on our own in R/Python?
I will be teaching several types of classifiers in July, however you will have to become familiar enough with R/Python to be able to write the program for one of the types of classifiers. R/Python comes with packages that can be used to construct support vector machines, decisions trees, and much more, so hopefully you don't have to write much code from scratch.
Question 7. If I submit my final project early, can I do a partial submission?
You are allowed to submit part of your final project, though you should state clearly over email to the gmail email account which objectives you would like graded. Once graded, the grade cannot be changed.