Introduction
The goal of this project is to get your hand "dirty" as a budding data scientist (and to practice certain materials taught in the class). After finishing the project, you will gain a much better appreciation for working with "data in the wild", a better understanding of what it means to work as a data scientist, a deeper understanding of the class materials, a chance to work with popular data science tools in Python, and a glimpse into some research efforts in data science.
Specifically, in this project, you will select a data science problem, collect data for that problem, "wrangle" the data, by extracting/cleaning/matching/integrating the data into a single unified data set, then analyze that data set to infer insights.
Stages
All the dates below may still be changed. All deadlines are 11:59 pm on the dates mentioned.
Stage 0: form team, two weeks, due Fri Jan 27. You will enter team information into a page that we will provide.
Stage 1: define the DS problem, collect data, 1.5 weeks, due Wed Feb 8
Stage 2: extracting structured information from raw data, two weeks, due Wed Mar 1
Stage 3: entity matching, out on Wed Mar 8, due on Sun Apr 2 (including 1 week spring break)
Stage 4: combining data into a single set, due Sun Apr 16
Stage 5: performing analysis on the integrated data, due Sun May 7