Project Description for CS 839 Spring 2019
Introduction
The goal of this project is to get your hand "dirty" as a budding data scientist (and to practice certain materials taught in the class). After finishing the project, you will gain a much better appreciation for working with "data in the wild", a better understanding of what it means to work as a data scientist, a deeper understanding of the class materials, a deeper understanding of how to use and debug machine learning models, a chance to work with popular data science tools in Python, and a glimpse into some research efforts in data science.
Specifically, in this project, you will collect data, "wrangle" the data, by extracting/cleaning/matching/integrating the data into a single unified data set, then analyze that data set to infer insights.
Stages
All the dates below may still be changed. All deadlines are 11:59 pm on the dates mentioned.
Stage 0: Form team, due Sun Feb 17. You will enter team information into a page that we will provide.
Stage 1: Information extraction from natural text. Out Fri Feb 15, due Sun Mar 10 (a bit more than 3 weeks)
Stage 2: Crawling and extracting structured data from Web pages. Out Fri Mar 15, due Fri Apr 12 (3 weeks, excluding one-week spring break)
Stage 3: Entity matching. Out Sun Apr 21, due Fri May 10