While I was organizing my code base and getting everything put into reproducible environments, I came across my old Monte Carlo Season Simulator. I made this back in 2019 when I was still living in Denver and was very early into my time of diving into both football and data science. This project was actually the first one that made me think there had to be something more powerful than excel to do data work in because I was really pushing the limits of what excel on my computer could handle. I will touch on the big picture steps and any other cool things that I come across while looking through it again.
This excel sheet is SLOW. I have it doing 2,000 calculations for each game of the season for the selected team with monte carlo simulations of rough Points per Game against Points Allowed per Game. Each time you open a cell, it will rerun the whole simulation. I had to turn off the auto calculate within excel because of this slowdown.
It includes a heck of a lot of vlookups and index(match()) throughout this code. Keeping everything straight and (somewhat) organized was clearly challenging for me back then.
First we take in a bunch of weighted input variables based on personal feelings or loosely on the previous years data to get a rough sense of team strength.
I had pulled in a specific schedule from a website and just hard-coded it in to pull from with regards to vlookups.
Then we get to the Points per Game and Points Allowed per Game projections. I took the previous year's values and then did some calculations on them to create a projection for the following year.
Once you select a team, it pulls in all of their opponents and whether it is home or away, as well as after bye, and calculates a set of game scores based on their projected PPG and PAPG 2,000 times. Each game decides a winner and taking the average across all 2,000 games gives a predicted win %.
Once it computes for all 17 weeks, you get the final projections for the seasons at the end including buckets based on 7-, 8, or 9+ wins for the season.
In conclusion, while this analysis has PLENTY of flaws looking back on it now, this was very fun to do and was one of the most important steps for me to push past excel and get onto the path of eventually becoming a data scientist.