Introduction
The goal of this project is to get your hand "dirty" as a budding data scientist (and to practice certain materials taught in the class). After finishing the project, you will gain a much better appreciation for working with "data in the wild", a better understanding of what it means to work as a data scientist, a deeper understanding of the class materials, a deeper understanding of how to use and debug machine learning models, a chance to work with popular data science tools in Python, and a glimpse into some research efforts in data science.
Specifically, in this project, you will collect data, "wrangle" the data, by extracting/cleaning/matching/integrating the data into a single unified data set, then analyze that data set to infer insights.
Stages
All the dates below may still be changed. All deadlines are 11:59 pm on the dates mentioned.
Stage 0: Form team, due Sun Feb 11. You will enter team information into a page that we will provide.
Stage 1: Information extraction from natural text. Out Fri Feb 9, due Sat Mar 3.
Stage 2: Crawling and extracting structured data from Web pages. Out Tue Mar 6, due Fri Mar 23.
Stage 3: Entity matching. Out Sun Apr 1, due Wed Apr 18.
Stage 4: Integrating and performing analysis. Out Wed Apr 25, due Wed May 9.