Milestone 3

Implementation

Implementation

The model is utilized through a python script developed using pandas, numpy, and scikit learn. The codebase relies on several data processing and analysis functions in order to pull data on statewide demographics and income statistics, as well as candidate report cards. This input data is leveraged through a Random Forest and Linear Regression model to predict optimal report card statistics for a certain region.
The script runs in two general modes, 'verbose' and 'non-verbose'. 'Non-verbose' mode is the default, which pulls data through the pipeline and feeds it into the models. After training, these models can be used to predict the outcomes of unknown elections.

An example of non-verbose mode.

An example of verbose mode.

Testing

When trained with a reduced data set of 184, the Random Forest Model had an R^2 value of 0.32, indicating that more data is required. Future steps would include increasing the quantity and variety of data in order to improve learned solutions, as well as increasing the number of models being utilized.

Task Breakdown