For the 2020 Carolina Data Challenge, our team conducted an analysis on the NFL and College Datasets from 2000 to 2013. Our project showcases a variety of visualizations that depict multiple statistics from each division in the NFL including changes within the NFL and College football over time by team, passing versus rushing over time, and significant statistics and linear modeling. We also analyzed the NFL and College Football Teams separately because they both function differently.
Our main goal was to manipulate the datasets in order to visualize changes over the years in specific playing techniques. We mainly preprocessed the data through Pandas and R, and used machine learning principles such as regression through R. For the visualization of data, our team primarily used Tableau.
In order to show the change of the data over time for the NFL and College teams, we used a simple Python script to group all of the .csv files into one immense .csv file which separated each team by season. In the end, we used Tableau to create a continuous graph of each team’s performance from the 2000-2001 season to the 2013-2014 season. For the NFL, we separated each team by its region, because it could become clear in comparing each teams’ performance to their own divisions and such. We also did not want to crowd the graphs. However, for the College teams, we created a dropdown where viewers can look at an individual’s College team’s performance year-to-year.
All in all, some of the significant conclusions we found included how teams could only be successful if they had a play scheme which did not favor a passing offense or running offense over each other, but a play scheme which complimented both kinds of play.
We also made conclusions using machine learning and regression modeling regarding which factors are important in winning games as coaches can be advised to develop their game around measures which matter.
For showing the passing versus rushing, and the significant statistics and linear models, we used python scripts to analyze the data and come up with new findings, then used regression modelling through R, and then used Tableau to display the data. These trends were league wide and we wanted to depict the success of the team depending on some factors versus other factors.
Analyzing the different regional trends, passing versus rushing data, and seeing which statistics are most significant in a team winning are all important in understanding how the game functions for both NFL and College football. From the beginning, it is important to note how different both of these games are because they depended on completely different factors and features of the game. However, it is also important to note that there is a general trend in how the game is won. It is neither only a strong offense, a strong defense, a strong passing game, nor a strong rushing game, but an equal mix of everything. The offense should be versatile in their play and not only lean towards gaining points through either rushing or passing. Successful teams seem to have a play scheme which respects both of these methods. That being said, there are some measures which do not affect the game of football at all (only for the NFL, all measures were significant for College football). For example, coaches always emphasize that time of possession is important, but other factors account for the success of a team more so than does time of possessions, it just so happens that when those factors are played well, time of possession increases. In order words, time of possession has a very high Variable Inflation Rating because it is extremely correlated with all the other predictors.. So, for this example, we can help team performance in the future by emphasizing to teams that efficient play is what matters.