When analyzing statistics, it is necessary to gather a large number of variables and gather significant amounts of data to gain a comprehensive understanding. With more data and variables, we can develop more accurate models and make better predictions. It is important to strike a balance between the number of variables and the amount of data collected, ensuring that the model accurately reflects the underlying reality. In short, I need more factors to analyze what affects the winners of NFL games. Below, I have a spreadsheet, where I have collected over 70 more variables that I believe can have an effect on an NFL game.
I used RStudio, a computer program for a programming language called R to create new data that I couldn't just "look up" on the internet. Very specific variables like 'yards per point scored at home' is not something that NFL.com or Pro Football Reference offers. I had to use the data that they offered and create new variables using the mutate() function in R in the specified data frame. These new variables are added by performing the operations on the present variables I had in the data set.Â
For example, here are some of the lines of code I used to do this:
s21 <- s21 %>% mutate(home_yds_per_pt = (home_yds / home_pts))
s21 <- s21 %>% mutate(away_yds_per_pt = (away_yds / away_pts))
s21 <- s21 %>% mutate(home_yds_allowed_per_pt = (home_yds_allowed / home_pts))
s21 <- s21 %>% mutate(away_yds_allowed_per_pt = (away_yds_allowed / away_pts))
These are only a few of the variables I created using R to understand what variables are the most significant in determining the winner of an NFL game.