Many scouts, amateur and professional, post their reports for NFL draft prospects online. This work synthesized some of the publicly available text to better understand a player's skill and the potential of their NFL career.
Years of following college football and the NFL have taught me that I do not have the eye for projecting which players will translate best at the professional level. Luckily, there are those scouts who share their notes on these prospects. With some topic modelling and sentiment analysis, we can get an idea of the consensus opinion on a draft prospect's strengths and weaknesses. Tuning the topic modeling was the most difficult step in this process for a few reasons: 1) A lot of jargon/phrases specific to football (examples - "wiggle" or "home-run speed"), 2) Many of the topics are related to one another (example - the difference between long/top speed and burst). To help refine the topics, smaller cluster sizes were used in the BERTopic model (documentation available here). The top words of each topic were then monitored for related ideas and subsequently merged. Each sentence of a player's text was then processed through this topic modelling and then sentiment analysis to get an idea of what rate of the time did each topic appeared and then the rating of that topic. The result of this was two new features for each main positional skill: the "count" (the % of the sentences related to this skill) and the "score" (the average sentiment of those related sentences). After these features were added, they were combined with other player information, including their NFL pick and their combine results if performed. This dataset was then modelled using logistic regression to predict a player's chance of ever reaching "Pro Bowl" or "1st Team All-Pro" level. To model a player's projected starter career length, a component-wise gradient survival model was used, with the idea behind this being that our population of NFL players is subject to the hazard of their sport, but certain playing styles might lend themselves to increasing the longevity of that player's career. Below is an embedded link to a live dashboard displaying the results of this project.
Please wait a second to give the dashboard time to load below.
Currently, I have running backs from 2012-2025 available. This is an ongoing project, and the next position to be modeled and made available will be wide receivers, followed by cornerbacks. Feel free to reach out for more information or feedback on this project at adabieq@umich.edu.