The Non-Technical Explanation
Acceptance of more advanced statistics in hockey has grown in the past decade, leading to the application of these methods into leagues outside of top professional sport. Proprietary models are applied to leagues and players, without clear understanding of what information actually goes into these models, and the impact that information has. There’s also a number of potential issues applying a model to a league when the model has been trained on data from outside that league. Working in women’s college hockey, this issue seems pretty important. Specifically, I encountered the application of an expected goal model to women’s college hockey that did not only appear inaccurate, but seemed completely implausible (imagine the model saying there should be 2.5+ more goals per game than historical league average). As a response, I tried to counter this the best way I knew how, by collecting data and creating my own model that was built on appropriate data.
For those less familiar, expected goals models take a bunch of information about a shot attempt, and assign a probability of that shot going in. For example, let’s say a 2 on 1 advantaged rush chance featuring a pass backdoor had a 30% chance of going in, it would be assigned 0.3 expected goals (xG). I collected information from 14400 unblocked shot attempts (around 200 games) from the 2023-2024 and 2024-2025 USport hockey seasons using a customized version of An Nguyen’s shot plotter web app. All the variables that I collected for each shot can be found in the table below. The collection, testing, and creation of this model took somewhere in the 600 hour range, because it turns out collecting a number of characteristics about every shot takes a while. The average number of unblocked shot attempts were just under 75, and in total, 841 non-empty net goals were scored and tracked.
Keeping this more non-technical but wanting to be transparent, I tested two different types of predictive models: a gradient boosted model, and an elastic net model, both using 10-fold cross validation. Both models ended up doing very similarly (both AUC’s around 0.815). Currently, the model I’ve been using and discussing has been the elastic net. For those reading this who aren’t here for stats jargon, the models do a fairly good job of assigning overall value to each shot.
Results and what this really means
This model was ultimately designed to more appropriately evaluate team level performance in Canadian collegiate women’s hockey. Shortly, there will be a Shiny App released that will allow you to apply this version of the model to shots you choose to track. What this will allow you to do is essentially re-create www.moneypuck.com’s “Deserve to Win O’Meter”, by simulating the probability of every shot over the course of the game to see how many times a team would beat another team if the game were to be replayed under similar conditions. It also allows you to see the probability of scoring different numbers of goals in games. I’ve included an example below for the type of data visualization that comes from tracking a game using the model.
I’ve removed which teams are involved in the game, but using the model allows us to see offensive creation over the course of a game, and the probability of the number of goals in each game. It also opens the conversation for the idea of score effects (where a losing team creates a lot of pressure late in a game, as the blue team did here being down 2-1 late). Even with the score effects in this game, you can see that the red team had a greater percentage chance of scoring 0 goals (37.7%) compared to the blue team (27.3%). The model also helps us understand what types of chances are simply better than others at this level of hockey, and what chances are really truly not good at all. For the time being the exact model coefficients aren't going to be shared (soon), but I will say it does have some notable differences from professional men’s hockey.
Model Strengths
- The model accounts for context information that a lot of public professional models don’t have access to, such as screens, the type of rush chance, and pre-shot movement created by “dangerous” pass attempts. This is important for tactical considerations, and also having more information is always nice.
- What I find to be the most interesting use case of the model is not which shots create the highest expected goals (it’s the ones closest to the net, like every model ever), but rather just how unvaluable some types of shots are. I care most about how data can inform tactics and coaching decisions, and I do think this model accomplishes that well.
Some limitations
- The original purpose of this project was to build on WAR On Ice’s shot value system, but it quickly morphed into an expected goals model project. As such, I wish I would’ve tracked the coordinate location of the pass rather than just the type of pass. What this ends up doing is accounting for pre-shot movement that impacts the goalie, but takes away some information about pass distances, and other pass types.
- The sample size is objectively still really small. Collecting this information ended up being a little slower than I was hoping, but the plan is to continue building on this as we go.
Future Directions
- I have plans to first test this model against women’s NCAA data, and then ultimately incorporate that data into a more expanded version of the model. While the top end ability of the NCAA players is going to be higher than USport, I believe there’s a lot more cross over than people may realize or appreciate. Given that both D1 and USport hockey have very similar goals scored per game, I think the WHIMS model will do okay. If it doesn’t, I’ll just throw in a variable that says whether the shot came from the NCAA or USport, and we’ll see if there’s enough cross over information to further improve model performance.
- I have more shots to collect, a lot more shots! It’s going to take a lot of time, so if people are curious or interested in collecting some unblocked shot data, I’m more than happy to share my shot plotter setup. Model updates will probably happen every 5000-7000 shots collected moving forward.
- I mentioned it earlier, but hopefully a web app will be made publicly available (when I can spend some time on it), because I think there’s a lot we can learn from women’s collegiate hockey to better support youth girl’s hockey development, and also I think having this type of extra information is fun for college hockey fans (read: fun for me).
- Lastly, I will be tracking and publishing the data from every game in the USport women’s national championship (March 20-23), so look out for that if you’ve made it this far.
Shots were tracked using https://shot-plotter.netlify.app/
All analysis and visualizations were done in R.