Initial Methods

This page is adapted from our finalized project idea submission

We plan to weight the various data we decide to use, although the specifics of this weighting system will be done along with the bulk of our other project. For instance, we could evaluate both Strength of Record(SOR) and a team’s record in regular season games and weight them according to how we believe it will predict tournament success. What we plan on doing is weighting SOR much more than a team’s record, as strength of record takes into account the strength of your opponents, while a team’s record only takes into account if they won or lost a game, and we know college basketball schedules vary wildly in strength of opponents.

We plan to give each team a final “score”. Teams with a higher score will have a higher chance of winning, although we know that there us randomness to basketball games, so there will also be a random component as well, likely with the MATLAB randi command.


From our brainstorming, we came up with the following methods that we are going to look into the most

Method 1: Using logistic regression, looking at previous tournaments to see if any stats (ie. strength of record of team) showed any relation of which team won the game. Use that as a factor in determining who who would win a given game between two teams.

Method 2: Gather as much data as possible, look back at past tournaments and use our intuition to weight each data category based on how we think it predicts game outcomes. For example, we would weight SOR higher than winning percentage for predicting game outcomes.


To evaluate possible methods, We collected information on 45 teams that have either guaranteed a spot in the tournament or are considered a lock for the tournament. We took data on the teams that we think will be useful in determining the winner of different games of the tournament. Some of the stats we recorded are the teams strength of record, free throw percentage, and turnover percentage. We had included 10 stats in total. There are potentially other stats to consider for instance the seed in the tournament that we do not know until Sunday night. This could be important because so far the lowest seed to win the entire tournament as been an 8 seed. Once we collected all of the data we put it into a matrix and used the svd function in MATLAB. The plot of the diagonal that we got from the function is below.