The first model I created was very simple. It consisted of two teams matched up against each other. In order to predict the total score, spread, and winner of the game, this model needed four variables: the home team's average points scored and allowed per game at home and the away team's average points scored and allowed when they are on the road.
The New England Patriots average 24.7 points scored per game on the road, and give up an average of 17.3 points per game on the road.
The Denver Broncos are averaging 22.5 points per game scored at home, and are giving up an average of 18.2 points per game at home.
To determine the predicted score for each team when the Patriots play at the Broncos, you average the points scored by each team with the points allowed by the opposing team.
The Patriots score 24.7 points per game and the Broncos give up 18.2 points per game.
This is an average of 21.45 points.
Simply add 24.7 and 18.2, and then divide by two.
The Patriots are predicted to score 21.45 points in the game using this model. Round 21.45 off, and you get 21 points.
The Broncos score 22.5 points per game and the Patriots give up 17.3.
This is an average of 19.9 points.
When you round this, you predict the Broncos are going to score 20 points.
This model predicts...
Score: 21-20
Spread: -1
Moneyline: New England Patriots
After collecting all the data from the 2021-22 season on points scored and points allowed for both home and away games, I used this formula to see how this model would perform against the actual outcome of games that year.
In Rstudio, I created a formula that would allow me to create two new variables called home_perf and away_perf which is just the predicted amount of points each team will score. With these two new variables, I created another formula that would just take the difference between the two, which will create the predicted spread for the matchup.
s21 <- s21 %>% mutate(home_perf = (home_pts + away_pts_allowed) / 2,
away_perf = (away_pts + home_pts_allowed) / 2,
pred_spread = home_perf - away_perf)
To compare the performance of this model to the result of the games I used a cor() command in R to see how strong the correlation between my model and the actual outcome of each game along with using the ggplot() command to create scatter plots.
Above, I compared the total amount of points each team scored and compared it to the total amount of points my model predicted for each game. The graph on the left shows the correlation between the total amount of points each team scored with the total line that most sportsbooks have. With the graph of my predicted totals, the points seem to be closer together than on the graph of the sportsbook totals, meaning it has a stronger relationship. We use the R-value to determine the strength of two variables.
The Correlation of my predicted totals vs actual total :
r= .4383169
Correlation of total line of sportsbook vs actual total :
r= .2520148
Here, we can see that the model I used, is more efficient at predicting the total of NFL games. Even though my model has a larger R-value, an R-value of .438 is not great in the world of statistics.
Here, I compared the actual performance of both the home team and away team to what my model predicts they would score. The strength of the graph is better for both compared to the total, but still not great.
Actual Home Score vs Predicted Home Score Correlation:
R= 0.536512
Actual Away Score vs Predicted Away Score Correlation:
R= 0.4904563
This model did well considering that is one of the most simple ways to predict an NFL game because of how few variables it is using. In order for me to create a better model, it needed to be more specific and account for more factors.