Final Projects

Final projects can be done individually or in groups of up to 3. You will analyze one or more data sets of your choice to answer questions that you devise. When you have selected your data set and preliminary questions, schedule a 10-minute meeting with me to go over it so that we can discuss it.

Requirements

Your data sets need to be a combined size of at least 500 MB. They can be as big as you are able to analyze with the computational resources that we have. While you can probably make a good project with just one dataset, the best ones will pull data from multiple sources to do more interesting analyses that span subjects.

For the analysis, you will need to include at least two machine learning algorithms in the analysis process to try to answer your questions. You must also include at least three visualizations. Larger groups should have more. You should write up a description of your work that is ~2 pages per student on the team that you will turn into me. The write-up will give full background with a description of the data set, the questions, and the motivation for why you chose them. It will then describe the process that you went through for the analysis as well as the answers to your questions.

Presentations

During the final period, you will present your projects. Each person gets 10 minutes to talk, and even in groups, each person is expected to speak for roughly that long. The presentation should cover the same basic material that is in the write-up.