Who is the best quarterback at the moment? Should that be decided based on passing yards? Passing touchdowns? Game winning drives?? Some combo of the above? But which combo? In this exercise, you will learn how to manipulate data a bit - subsetting and creating new columns in particular.
One important skill you will need to learn is how to subset data. This means write a few lines of code to take just the small portion of the data that you need.
Import this dataset called "qb2023.csv," which has the main passing stats for all the quarterbacks from the 2023 NFL season. We will use this to try and crown the highest performing quarterback.
Open up a new script file. Let's start by testing skills from the first exercise.
# In two lines, write some code that will find the median number of pass attempts for a quarterback and a histogram of the number of pass attempts for all the quarterbacks.
Do you notice something funny? Who are these quarterbacks with less than 10 passing attempts?? Well, anytime a player throws a pass, even if they are a runningback or receiver, they show up in this dataset. Let's filter out everyone but the QBs.
QB <- subset(qb2023, qb2023$Pos == "QB")
Breaking that all down...
QB <- This just stores what we will do into a variable called QB (this could be any variable name here like quarterbacks, or q or even name it Barbara if you want, that's a pretty name)
subset( This is the function, like mean(), or IQR(), that we will use to tell the computer what to do
qb2023, The first parameter is the thing that you want to subset, so the whole dataset here
qb2023$Pos == "QB") The second parameter is the truth statement that we want to use to tell the computer which entries we want. Here, we tell it to look at the qb2023 dataset, go to the $Pos column and take only the rows for which it is true that the entry there is "QB.
Notice a few things:
To test if something is equal to something else, you use a double equal sign.
If it's a word or letter (called a string in R), use ' ' or " " around it
We need to match whatever capitalization is in the dataset... computers are dumb!!!!
Run QB to make sure it worked. You can just do this in the console. Also get a histogram of the QB attempts to see that it's a little better.
QB
hist(QB$Att)
There are still QBs in there that didn't play that much. Can you make a new subset of this data that has only the QBs that had more than 20 pass attempts? You should be able to use the same subset command as above, but instead of the "==" you will need a ">" or a ">=". Also remember that numerical things don't require quotation marks. Once you are done that, you are ready to move on.
Now, it is time to create some new columns. This is called feature creation. This is how we are going to program in advanced stats. Here is how to make a new column in a dataset. First, we are going to make a new column that adds up the total number of passing yards and 10 times number of touchdowns. The first step is to initialize the column by putting NA's in a brand new column name for our new advanced stat. Note that you may need to fill in your own data set name if you didn't call it QB (use your subsetted one from above).
QB$yardsPlusTDs <- NA
Then you can do math with other columns to create a new column.
QB$yardsPlusTDs <- QB$Yds + 10*QB$TD
Check out the dataset. Find QB in your environment and click on the little white grid to the right of it, or run View(QB). Scroll over to the last column and see your work!!!
Now, it would be great to just see the top performers. Don't worry too much about how this works, but this will print off the name and advanced stat for the top 10 players. You may have to put your own column and dataset names in here.
head(QB[order(-QB$yardsPlusTDs),c("Player","yardsPlusTDs")],10)
Let's also see how your new stat compares with ESPN's QBR:
plot(QB$yardsPlusTDs ~ QB$QBR)
Come up with your own quarterback rating. Compare your rating to the ESPN rating. Who does the ESPN rating value more than you do? And vice versa? What do you think the strengths and weaknesses are of each of your models, especially compared to the "eye test" of who you think is a good quarterback?