Develop a model to predict which General Manager is most likely to take a specific player based on their pre-draft information.
For this model I used data from the 2003 through 2024 drafts along with labels for the team and GM associated with that team. The GM labels were found through researching the team pages and pfr, with the draft pick information coming from nflfastr.
Athletic testing information: This includes the measurements of the players and their testing numbers from the combine. Additionally, there is a column for estimated forty yard dash time so that even players without combine data would still have some athletic information. Missing values were imputed with the median.
NGS scores: These are the athleticism, production, and final scores from Next-Gen Stats for the prospects. These have been trained using an XGBoost on if the player was likely to be a starter or pro-bowler and I was part of the most recent training of these models.
Other Features
Consensus Big Board information: Taking the consensus big board rank and converting it into an imputed grade scraped from NFL scouting grades
Underclassman label: If the player declared for the draft as an underclassman. This is as close of a proxy for age as I could easily find.
Conference label: Label for if the player was in school at the Power 5 or Group of 5 level in their last year. My conversion table is from a few years ago hence the 5 and not 4 labels, but given that the data goes back further and I didn't want to create a conversion table for each season, this was good enough for me.
I utilized an XGBoost classifier trained to predict on the label of General Manager. My data was split into 60% train, 20% validation, and 20% test with stratified layers to ensure that each GM has data in test and train. I also included sample weights to force some the model to care more about more recent drafts and GMs with an exponential function centered on 2021 with an alpha of 0.5 (2024 = 4.48, 2021 = 1.0, and 2003 = 0.00012).
I did a grid search for hyperparameter tuning with the average_precision_score as the eval metric and settled on 'colsample_bytree': 1.0, 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200, 'subsample': 0.8
The feature importance plot for the top 10 features is shown below and I feel good that the model didn't latch too heavily onto any individual feature to dominate the predictions.
Model Performance: log_loss: 4.2819, top_1_accuracy: 0.0279, top_5_accuracy: 0.1540. The model didn't perform particularly effectively, though I feel it was more accurate on recent data, which was the point of the exercise. I also don't feel that it was overfit as the test error rates were actually better than the validation error rates (log_loss: 4.3468, top_1_accuracy: 0.0425, top_5_accuracy: 0.1222)
I generated a handful of outputs from this model including the obvious predictions on the upcoming class as shown below. Additionally, I generated tables for all players in the sample with their top 5 most likely predictions along with the actual GM that drafted that player and a table of all the players by if a GM was in their top 5 predictions at all, for easier quick sorting and filtering.
Additionally, I created a few SHAP plots to aid in model explainability and further understanding of how a particular GMs predictions were created.
I also wanted to see the General Manager's likely types and how they compared to other GMs so I created a principal component plot with the first two principal components (explained 70% of the total variance) and a table showing their average features across all their picks.
All of these outputs are available in a streamlit app that I created to show off these tables and plots which can be found here: https://nfl-gm-analysis.streamlit.app
Overall, I would say that an analysis like this is very challenging since there are all kinds of bias and difficulties with availability that occur in the draft, but I feel good about this model. The model doesn't have particularly great evaluation metrics with a pretty poor roc_auc overall, but the model is more accurate on more recent draft picks which was the purpose of this exercise. Not to mention the fact that player body types have been changing as well so older GMs are on average drafting bigger players than newer GMs.
It is very interesting to see the clear body type preferences that certain GMs have, like Buffalo's Brandon Beane having a clear preference for taller, bigger wide receivers and the Raven's Eric DeCosta having the opposite with a lot of smaller, lighter wide receivers being predicted to be taken by them.