American Football coaching powered by
Visual Learning
American Football coaching powered by
Visual Learning
Sumanth Subramanya Rao (ssr2) Eeshwar Gurushankar Prasad (egurusha) Sai Vishwas Padigi (spadigi)
Website link : https://sites.google.com/view/football-analysis-yolo/
In American football, coaches can greatly benefit from the power of technology to aid their analysis of games. In this sport, only one forward pass is allowed per play. Hence, it is crucial to make the right move. We try to help solve this problem by developing an automated tool for suggesting the best pass a player can make such that the team can make maximum progress towards the team’s end zone. Our idea to solve this is to map the players’ positions onto a 2D plane and find the path of the least obstruction where the ball can be passed.
There has been extensive research on camera calibration in other sports such as soccer and basketball. There have been efforts to identify the players, and their teams, and track them and the ball during the play. However, these methods use ball tracking as their primary source to base their solution. However, the ball is almost always occluded in American football. Hence, we try to build our solution by identifying the roles of the players rather than tracking the ball.
Some prior works that helped shape our approach include:
Sports Camera Calibration via Synthetic Data (Jianhui Chen, James J. Little et al.)
In this work [1], the authors perform edge detection on the image to identify which part of the soccer field the players are located in and project the player locations to the predefined field template.
The authors use a two-GAN model to detect field markings and a siamese network to learn the features of these edge images.
In this work [2], the authors provide an overview of the different sports datasets available, the various techniques used for player detection, action recognition, and the different approaches to generate a bird's eye view of the field given an image taken from any camera angle.
Detecting events and key actors in multi-person videos (Vignesh Ramanathan, Li Fei-Fei et al. )
In this paper [3], the authors use an RNN to track the players in the basketball videos and another RNN for event detection/classification. The authors successfully show that the attention mechanism is able to consistently localize the relevant players.
A real game snippet displaying the line of scrimmage and a forward pass being made by the quarterback
Our model's output:
the projected passes to the potential receivers
the predicted best pass
We plan to provide passing strategy analysis for the coaches on American Football data. Our method takes a snapshot of the game, analyses of the players, teams, relative positions, and their roles. Using all this information, we project the snapshot to a 2D grid to analyze the best possible receivers who can move forward with the ball after which we identify the best pass at that instant based on tackler positions. For this particular downstream task of identifying the receiver for the best forward pass, we identify the quarterback and build on that as it is the quarterback that takes the key decisions of whom to pass the ball to or whether they themselves should run forward with the ball.
Our idea intuitively makes sense because, in American football, the forward pass is crucial in deciding the touchdown possibility. And since all the players are positioned statically at the beginning of the kickoff, it's viable to analyze the player positions and obtain the best possible receivers who can take the ball forward. Since we want to analyze the snapshot of the entire field, taking a 2D bird's eye view for our analysis makes sense.
The dataset that we have used for this project is the Football Player Detection Image Dataset from Roboflow. This dataset contains training, validation, and test images of football gameplays with bounding box information for the players labeled by their role/position.
These positions for the offensive team include:
Center
QB (quarterback)
Skill
The other offensive positions such as Running back, Fullback, Tightend, H-back, and Wide Receiver have all been labeled as Skill
The positions for the defensive team include:
db (defensive back)
lb (linebacker)
We take a multi-step approach with snapshot images of game videos as our data.
The major steps in our approach are:
Player and position detection
Team identification
Perspective transformation to 2D
Receiver and tackler identification
Pass identification
The following sequence of images shows the intermediate outputs generated by each of the above steps for 4 sample input images as we go from the top to the bottom.
Note: We have chosen these sample images such that the offensive team is on the left in the first two images and they attack to the right during the play. However, in the last two images, the offensive team is on the right and the progression of the play is to the left.
Input image 1
Input image 2
Input image 3
Input image 4
As the first step, we detect players using a fine-tuned YOLOv5 [4] model. This detects the players' positions and their roles (whether the player is a quarterback, linebacker, defensive back, etc.). Once we have the bounding boxes, we detect the quarterback from the identified players as they make the decision of whom to pass the ball to or run forward with the ball.
After the players have been identified, we assign players to teams so that we can identify the passing options. We do this using the color of the players’ uniforms. We use k-means to create 3 major clusters - one for each team and one for the background. In the images below, the offensive team is represented in blue, the defensive team in red and the quarterback in yellow.
Since camera angles change across frames, for our analysis, we transform the view to a standardized frame of reference - to get an approximation of the top view. The first step is to rotate the image such that the markings on the field are aligned vertically.
We then apply an affine transformation to transform the captured camera view into an approximation of the top view of the field. This causes the image to get warped, i.e., the farther portions of the field are more stretched than the closer ones. The resulting image should have the vertical field markings parallel to each other and separated by a uniform distance. [7]
Using the previously obtained positions of the players, we map this onto a 2D grid for visualization. Using the same convention as above, the blue points represent the offensive team players, the red points the opponents, and the yellow point the offensive quarterback.
Next, we identify potential receivers for the pass after eliminating the options that are obstructed by opponent team players as well as receivers who have defending players too close to them. We filter the possible receivers based on their distance and the tacklers surrounding the players.
The below diagram shows the three closest opponent team players for each of the potential receivers. We identify the most viable tacklers for each receiver based on their distances from the receiver.
Using a heuristic that is a combination of how much a receiver has advanced forward, and the distances of the opponent players near the receiver, we identify the "best pass" that can be made at that instant in the game.
The below images show the same identified pass on the input images for easy visualization.
In this section, we present the results of our player and position identification model.
Initially, we trained a single model to detect all the players and identify their respective positions. The predictions on the quarterback were not good due to the imbalanced nature of the classes, which is due to the fact that there is only one quarterback per image. Hence, we trained another model to detect only the quarterback. Once we finetuned this binary classifier, the mAP scores (0.5:0.95) improved manifold from 0.0359 to 0.804 as shown in the table below.
In this section, we summarize the results and compare them against real-world game scenarios. We pick image snapshots from game kickoff videos and pass them as inputs to our model and obtain the predictions for the best pass based on our heuristics. We compare these predictions with the actual game outcomes.
Game Scenario: Here, we see that the quarterback makes a right forward pass, and the receiver is able to successfully push forward.
Model Output: Our model also predicts the same receiver as the best player to pass to and hence we validate that our approach yields results that are successful in real-world games.
The lines indicate the potential players who can receive a forward pass from the quarterback
The lines from each one of the potential receivers indicate their potential tacklers
The line indicates the best pass the quarterback can make based on the tackler positions
Game Scenario: Here, we see that the quarter back makes a right forward pass, and the receiver is tackled and fails to move forward.
Model Output: Our model is able to predict the topmost player as the best receiver instead of the bottom players. Hence, we can see that our model is able to avoid congregated passes and provide better analytics to improve the game.
The lines indicate the potential players who can receive a forward pass from the quarterback
The lines from each one of the potential receivers indicate their potential tacklers
The line indicates the best pass the quarterback can make based on the tackler positions
Game Scenario: Here, we see that the quarterback makes a forward pass, and the receiver is tackled and fails to move forward.
Model Output: Our model is also predicting the top player as the best receiver in this scenario but in the game, this pass leads to a tackle. Our model can be improved in such scenarios to better analyze the relative positions of the team members and opponents.
The lines indicate the potential players who can receive a forward pass from the quarterback
The lines from each one of the potential receivers indicate their potential tacklers
The line indicates the best pass the quarterback can make based on the tackler positions
In this case, we can see that our model fails to identify and generate the bounding boxes for the players that are very close to each other, which is usually the case at the line of scrimmage.
For this image, our model was unable to identify the quarterback and hence we could not perform any analysis on this image.
Here, the model fails to identify the team colors. We expected our model to fail when the contrast between the colors of the two team uniforms is lesser. Here, the uniforms of both the teams have a similar color combination which is hard to differentiate even with bare eyes.
This can be made robust by identifying only the top half of the jersey or using more advanced methods which do not rely on the color contrast of the uniform for team identification.
In this project, we have built an end-to-end application that takes in a snapshot of a football game at any point and then produces an analysis of what passes are possible, and also suggests the optimal pass to make. We believe that various stages of the output (such as the one that shows potential pass receivers and tacklers, the final pass) can be used by the coaching staff to improve their gameplay and also help them come up with new strategies for future games. To build this application, our overarching goal was to obtain a 2-dimensional top-view of the field to analyze the relative player positions and find the paths of least obstruction. To this end, we developed multiple components such as player position identification, team identification, perspective transformation to a 2D view, potential pass receiver identification, obstruction identification, and optimal pass identification.
As shown above, our application is able to identify good passes in many cases. However, it does have some failure cases for which the application needs to be improved. In the next section, we discuss some potential methods which can be used to improve the accuracy of our application to make it more usable.
In this project, we have explored the domain of sports analysis on American football data. Our model works on snapshot images of the games and provides passing strategy analysis. We use deep learning methods to identify the players, their positions, and roles. Using this information, we map the positions onto a 2D grid for a bird's eye view which provides a better perspective. We majorly rely on classical CV methods such as affine transformations, Hough line detection [5], and perspective transformations in order to achieve the final 2D mapping. But this part can also be replaced by a deep learning approach. A deep generative model could be leveraged to directly output the 2D transformation [6].
Similarly, the domain of input for our model is images, however, the model can be further extended to work with videos for better tracking and more granular analysis over longer periods of the game that can be useful for coaching. Another area of improvement can be the pass analysis, wherein the model can provide richer metrics on passes such as a confidence score for each pass. The model can also incorporate more fine-grained analysis on whether making a pass is the right choice or if the quarterback themselves should move forward instead of making a forward pass in certain scenarios.
Our code is available on Github at https://github.com/spadigicmu/american-football-analysis-yolo
Chen, J., & Little, J. "Sports Camera Calibration via Synthetic Data" . 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2497-2504. (2019)
Naik, Banoth Thulasya, Mohammad Farukh Hashmi and Neeraj Dhanraj Bokde. “A Comprehensive Review of Computer Vision in Sports: Open Issues, Future Trends and Research Directions.” ArXiv abs/2203.02281 (2022): n. pag.
Ramanathan, Vignesh, Jonathan Huang, Sami Abu-El-Haija, Alexander N. Gorban, Kevin P. Murphy and Li Fei-Fei. “Detecting Events and Key Actors in Multi-person Videos.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 3043-3053.
Redmon, Joseph, Santosh Kumar Divvala, Ross B. Girshick and Ali Farhadi. “You Only Look Once: Unified, Real-Time Object Detection.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 779-788.
Richard O. Duda and Peter E. Hart. 1972. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 1 (Jan. 1972), 11–15. https://doi.org/10.1145/361237.361242
Yan, Xinchen, Jimei Yang, Ersin Yumer, Yijie Guo and Honglak Lee. “Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision.” NIPS (2016).
Gong, Shi, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou and Xiang Bai. “GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation.” ArXiv abs/2204.07733 (2022): n. pag.