Plan

Over the next three weeks, we have many tasks we must accomplish. Three of the biggest tasks are: 1. Finding which data is the most significant in determining high quality teams, 2. Finding a way to correctly weight stats based on their importance to winning basketball games and to create an algorithm to predict winners in future teams, and 3. comparing the teams based on our algorithm to decide game results and finding the correct amount of randomness to instill into the algorithm to make the tournament predictor more realistic .

1. Finding Significant Data

In this task, we plan to use some statistical analysis to evaluate how useful particular stats are towards determining the outcome of games of prior year tournaments. For example, we could use logistic regression to determine which stats were highly correlated with winners in a certain round. Using logistic regression, we could run each stat (Free Throw %, Strength of Record, Adjusted Offensive Rating, Net rating, etc.) with the binary outcome of whether a team won in the round of 64 or not and look at each statistic and see if any of them show a high correlation with winning the round of 64 game. Doing this analysis across all rounds and stats, we can then pick out which variables are more significant towards winning in that round, and we will weight those stats more heavily in our algorithm to predict future tournament outcomes.

2. Correctly weighting statistics and algorithm creation

Using our above bullet point on finding significant data, we will have a list of the most impact stats on the outcome of tournament games by round. We can use these findings to correctly weigh the statistics as one way to predict a winner given a certain matchup. Our algorithm will take in these weighted statistics as one parameter to determine a winner, but will also include things such as seed vs. seed history from past years, cosine similarity of statistics with past champions (a preliminary test of this is in the data section), comparing the two teams that are playing each other (most likely by subtracting their statistics from one another) and rewarding teams with big differences in statistically significant stats, and more to be determined methods. One final parameter in algorithm creation will be including a degree of randomness that is inherit in deciding the outcomes of basketball games, which will be written about in length in task (3).

3. Using Algorithm to determine outcome of tournament games while including randomness

While we expect the algorithm to get most of the outcomes right, there is a degree of randomness to each game. For example, last year Virginia as a 1 seed lost in the first round. This had never happened before. There is no way our algorithm would've been able to predict this upset either. That is why we want to add randomness to our algorithm because we understand anything can happen. The challenge is how much randomness do we include in our algorithm to be accurate but also be able to predict some of the major upsets in the tournament. We are thinking of using a random number generator to determine this.