Predictors

General Methodology

The general method used for each predictor was very similar. The team stats were first normalized since it would be hard to compare total points vs field goal percentage. Each stat was then given a weight determined by one of the methods described on the methods page. Each teams normalized stats were then multiplied by the weight determined by that stat. After the multiplication, all the weighted normalized team stats were added together to form a team score. The winner of each game was then the team with the higher determined score. The method was used for every game in the bracket to determine the winners and eventually the champion. Each method had slightly different variations of this strategy but the overall general method was the same. Some of these slight differences is for feature selection and feature selection pairs, a team score was calculated round by round until round 4. After round 3, the round 3 score was used because the round scores after would be biased due to there not being enough data points available. With Supervised PCA, the team score is calculated at the beginning and used for the entire bracket. In Lasso, the team score is calculated round by round, as the lasso regression model we used uses each round as its own optimization problem.

Scoring Methods

For scoring, we used the ESPN scoring scale to score all brackets. In this method, each round is worth 320 points. There are 6 rounds so there is a maximum possible score of 1920. The average score for this year was between 640 and 720 points. In the first round, there is 32 games so predicting the correct outcome of each game adds 10 points to your bracket score. The 2nd round has 16 games with each game being worth 20 points. The 3rd round is 8 games with each game being worth 40 points. The 4th round has 4 games each worth 80 points. The 5th round has 2 games each worth 160 points. The 6th and final round has one game worth 320 points.

Feature Selection Method

Below is the bracket that feature selection predicted. Its ESPN score is 780

Pair Feature Selection Method

Below is the bracket predicted by pair feature selection. Its ESPN score is 750.

Supervised PCA Method

Below is the bracket supervised PCA predicted. Its ESPN score 740.

Binomial Logistic Regression

Below is the bracket predicted by Binomial Logistic regression Its ESPN score is 340.

Lasso Method

Below is the bracket predicted by Lasso. Its ESPN score is 1240.

All Methods Together

Below is the bracket predicted by all the methods combined. Its ESPN score is 1190

Methods Summary

As shown by the scores for each bracket, Lasso produced the best bracket and predicted the champion correctly. Before the tournament, we all made brackets to see if our predictors could score better than us, so our scores along with our different methods can be found below. In general, DSP was better at predicting march madness than we were.

Overall the combined predictor did significantly worse than lasso. The lasso predictor got a score of 1240 while the ultimate predictor had a score of 760. Also because Lasso is trained by previous years Lasso alone did better for previous tournaments data then the combined predictor. For instance the Lasso Predictor got a score of 1520 for 2018.


  • Lasso - 1240
  • Josh's Bracket - 920
  • Feature Selection - 780
  • Geoff's Bracket - 780
  • Combined Bracket - 760
  • Supervised PCA - 740
  • Sam's Bracket - 640
  • Spencer's Bracket - 590
  • Binomial Logistic Regression - 340