Develop models to predict Wide Receiver and Running back success at the NFL level from their college production profile
For this analysis I used data from PFF from the 2014 through 2024 college season as well as athleticism data from the combine. Additionally I used NFL data from PFF to generate the target variable.
Athletic testing information: This includes the measurements of the players and their testing numbers from the combine. Missing values were imputed using KNN based on their college production profiles to get a more accurate account of the body types associated with that profile. Additionally the athleticism score from NGS was utilized in the WR model.
Volume and Efficiency metrics: This changed depending on the model, but I utilized a mixture of volume metrics (total attempts, total routes run) as well as efficiency metrics (yards per route run, explosive rush rate, etc.). I also generated features for their best and worst season to try and capture players with a breakout season that might not be captured by a raw mean.
Scouting Report Data from Dane Brugler: For the WR model I incorporated some sentiment analysis on the scouting reports from Dane Brugler in his infamous "Beast" draft guides. I took the strengths and weaknesses sections and processed them using NLP and ran through a custom sentiment analysis for different aspects like playmaking, athleticism, and off-field. This adds further dimension to the model and boosted the predictive power and removed outlier misses.
For a more thorough breakdown of the features used, please check out my streamlit app with the collapsible definitions menu. https://nfl-gm-analysis.streamlit.app/
I utilized an XGBoost model trained on a composite target variable comprised of a weighted average of volume and efficiency metrics at the NFL level over a player's first four seasons. I also passed in a sample weight that was scaled using the logit function to the target variable because I am more interested in predicting the top end players rather than correctly predicting lower value players. I also ensured that the features had monotonic constraints placed on them to ensure that the model would learn a feature was important without finding pockets where a slower 40 yard dash time could be correlated to better players and rewarding slower 40 times for example.
I did a grid search for hyperparameter tuning with the rmse as the eval metric and settled on these parameters
WR: [learning_rate: 0.05, max_depth: 5, num_estimators: 100, col_subsample: 1.0, subsample: 0.8]
RB: [learning_rate: 0.05, max_depth: 5, num_estimators: 100, col_subsample: 0.8, subsample: 0.8]
I evaluated the models on a few criteria including their rmse and r^2 values which can be found for the test set below. I also plotted the correlation between predictions and actual values to get a gauge of the model in terms of over or under predicting on average. I also compared these models with the validation set (train 60%, validation 20%, test 20%) to ensure the model could generalize properly and wasn't overfitting. I also tried leave-one-season-out cross validation with no real boost in model performance.
WR: [R^2: 0.350, rmse: 0.182, mae: 0.140]
RB: [R^2: 0.220, rmse: 0.187, mae: 0.145]
From these core models I wanted to create a bit of a distribution of outcomes based on how closely the player performed to their "true" athleticism at the combine. To accomplish this I stepped the testing results up and down and re-ran the model with the same production to generate a range of predicted outcomes. From this I could generate a distribution and highlight players who might could have higher ceilings in the NFL if their on-field athleticism is higher than their listed scores. Additionally, I can identify players with higher floors who would have similar projections even with slightly worse on-field athleticism.
I also output the predictions from the model and the target variable for all the players in the sample and discuss some of the patterns in the consistent over and under predictions between the models.
Finally, I generated a number of SHAP plots to aid in the explanability of the model. This was in the form of bar plots, beeswarm plots, scatter plots, and waterfall plots.
Overall I feel really good about these models, particularly the WR model. I have felt confident in my WR rankings for a while by utilizing cutoffs and thresholds for these productions numbers, but having a model and feeling confident in its projections is much nicer. Overall the over-predictions tend to be guys who produced well at the college level, often at a group of 5 school and then couldn't step up their game as much. The under-predictions were a bit more frustrating, but I would rather lean towards false negatives than false positives in this case. A decent number of those eventual hits had higher max predictions, highlighting the higher upside if they could play slightly more athletic than they tested.
The RB model is also pretty good and I feel pretty good about the predictions moving forward, though the model isn't as good as the WR model. I had more notable misses in both the over and under prediction groups and there wasn't as clear of a trend for some of the bigger names. You could apply the same logic as above for some of the under predicted players with higher upside, but I feel a bigger issue is simply one of sample size and opportunity cost. Most teams play with only 1 running back on the field at a time so there are simply fewer players with significant snaps at the NFL level compared to the 3 wide receivers on the field for most plays. Additionally, if you are drafted to a team with a good running back you likely won't see the field to earn snaps because you are replacing a better player than replacing the 3rd wide receiver. Given those two facts, the output of the model is pretty good all things considered and I think it does a good job of highlighting talent and upside and it relies more on the situation for running backs than receivers.