Currently these findings are based on the results of 2021 Week # 1 and the entire 2021 Playoffs
28 Games worth of Data
638 Possessions
1677 First Down Calls
I will continue to add to this data set and update the findings below as this only represents 5% of the number of games for a given season.
Below is a chart that shows the expected number of points scored given that the team with the ball has a first down at yard line that is x yards away from the end zone. As you can see, at this point in the data collection there have been 35 observations of a first down at midfield with the expected number of points being 3.34.
Here are some visualizations of the data. We can see that a first down at a given distance to score (How far away a team is from the opponent's end zone) appears to be linearly related to the expected points scored on such a drive. The R-squared value of 0.725 is strong for this correlation and both the intercept and the coefficient for the regression line are statistically significant. For every yard closer to the opponent's end zone the offensive team expects to gain 0.04 points. Said another way, a first down 10 yards closer to the opponent's goal line gives the offensive team nearly a half-point higher in the number of expected points.
The data from the previous question (how many more points are expected for 10 yards gained) sparks another question, which is how much more likely is it that a team scores a touchdown given that it gains at least one first down on its drives. The data shows that a touchdown occurred 31% of the time (136/438) when a team earned at least one first down. In general, a touchdown occurred on 24% (156/638) of all drives. A team is 7% more likely to score a touchdown if it earns at least one first down. How can this be interpreted in a meaningful way? Assuming that a team has 12 offensive series in a game, if they should earn on first down on each series then they would be expected to score 0.31*12*6=22.32 points versus 0.24*12*6=17.28 points. This five point difference (on touchdowns scored) feels more meaningful. Offensive coordinators could consider putting detailed thought into opening sequences of a drive to ensure at least one first down has been gained.
Similar to the first down chart above, here is a chart of the expected points based on the starting field position. Such a chart is common in most football team's special teams meetings to emphasize the importance of the kicking teams in relation to points surrendered on a drive. The data thus far is limited so conclusions cannot be drawn just as yet, however we can look at the expected number of points for a drive that starts at the offensive team's own 25 yard line (the most frequent place to start a drive). After 195 repetitions the expected points scored is 1.95. Perhaps of note for now is that a drive starting only 5 yards further back (37 reps thus far) expects 0.97 points. This is nearly one point less and makes one consider the effect of moving the touchback after a kickoff from the 20 yard line to the 25 yard line in 2018.
Shown below are a histogram of starting field position (80 represents starting from the offense's own 20 yard line for example) and a bar chart of how series end.
This data is actually from the entire 2021 season as I was able to find a .csv file here containing every snap of the season (Thank you Darren Willman and Dominic Samang).
3rd and 1: 517 out of 730 (70.8%)
3rd and 2: 327 out of 552 (59.2%)
After some work, I wrote this code to get all 3rd Downs done together.
With the new data set I now have a lot more data to play with (approximately 34,600 plays) so I wanted to do the first down value project again. I had to do quite a bit of data cleanup from the original data file as there was not a variable for "points earned on the drive" as well some other complications. Still, the code has seemed to pull some reasonable results. The chart is shown below as a Google Sheet as the data frame has 376 rows. This data was also grouped by "yards to go".
The data conform less to the linear regression most likely since I split the "Yards To Go" at each yard line. Below I run it again with just a first down at each yard line. We see a much nicer linear fit to the data backed up by an R-Squared value of 0.946 with statistically significant intercept and coefficient.
These comparisons are interesting to look at since many "state value" estimations are based on the probability of converting a fourth down at a given yard line which is tough to do with limited data. In some cases 3rd down is data is used which has some difference from actual 4th down data.
On the right side is a table of 4th down conversion attempts from the 2013 NFL Season. Interestingly, but no surprisingly we see quite an increase in 4th down attempts likely to the increase of awareness in Data Analytics and the benefits of going for it on 4th down in certain situations.
The expectations shown in the table above do not account for points awarded to giving up the ball in the case of a missed field goal, only the expected points for attempting the field goal. Below is a histogram of made versus missed field goals by distance. as well as a box plot of the same data. (Note: 99 yard-line is the opponent's 1 yard line, 80 yard line is the opponent's 20-yard line, etc.)
Based on the work of the 4th Down Bot, here is a scatter plot of the net value of a field goal attempt using the historical data from the 2021 season. The 4th Down Bot used data from many more seasons so this is a scaled down version. The calculations are down by first determining the probability of a made field goal from the given yard line and multiplying this by (3-the value of a team taking over after a kickoff). Since the average starting field position after a kickoff is basically the 25 yard lines this value is 1.74 (based on the mean of the data) or 1.7776 (based on the linear regression computed above). We must then subtract the value of the opponent having a first down if we miss the field goal (7 yards farther back from our kick or the 20 yard line if we are inside the 20 yard line) times the probability of a missed field goal from this distance. Here is an example using a 57 yard field goal attempt (a kick from the opponent's 40 yard line), which had a 40% success rate in 2021:
0.4*(3-1.776)+0.6*(3.05)= -- 1.12 points.
This looks like something I can run a logistic regression on once I learn that package better. What I see here is a net worth of 1.2 points for field goals that are "automatic" and near zero net worth for field goals attempted outside the 25 yard line.
I was able to merge the net values of going for it on 4th down (using 4th down conversion probabilities) with field goal net values. I then used the linear regression shown above and assumed a 45-yard net punt to determine net values for punting. The final "4th Down Decision Chart" is shown below. Since there is limited data for longer field goals in a single season, I may try and update this with a logistic regression for field goals demonstrated in the book Mathletics. An example that stands out is the decision of "Kick a Field Goal" at the 56 yard line (opponents' 44). This is a 61-yard field goal which historically has a low chance of success but may have a high success rate for this particular session. I'd also like to see about making the cart more readable.
Below is an update version using a logistic regression to determine the probability of making a field goal from a certain distance. In this chart Field Goals are chosen on most 4th downs once you reach the opponent's 40 yard line.
I remember watching a Brian Billick football clinic from around 1994 or 1995 where he discusses game-planning. One of the more profound topics in his lecture for me as a 20-something year-old coach was how you could fairly accurately determine how many times a certain situation came up during an average game and therefore streamline your game-planning. For example, if 3rd and 3 only occurs twice a game then you don't need 10 different plays for that situation, only 3 or 4 at the most. Below is a chart that still needs some cleaning but shows rough numbers of plays per down and distance. I would like to remove red zone snaps and then group these by typical divisions like "Third-and-Medium",etc.
These are the numbers for 3rd Down.
This was not done in Python, but using Google Sheets pivot tables (more practice). This data is from the 1st 4 weeks of the 2021 football season based on Pro Football Focus Data. I decided to pull this because I was starting to do a "Scouting Report" on the Baltimore Ravens defense and had a hard time believing that they were giving up 7.4 yards per play versus 11 personnel between the 20's. Sure enough, they were and not too far off from the league average.