Below was my submission to the Utah HC Summer Analytics Challenge. Some information about the Dataset and Methods were taken off from the actual submission to protect my model. The rest is word for word the submission sent to Utah.
The results mentioned in the paper can be found here.
Forecasting NHL Contracts
Insights from Predictive Modeling
1. Introduction
NHL teams often get criticized publicly by fans of the sport for the contracts they hand out. This isn't to say that the fans are right, or that the teams are in the wrong, but maybe there is a right answer? Fans aren't in NHL front offices for a reason and the landscape of the NHL would look a lot different if they were. But what if there were a way to accurately predict the contract a player would be looking for. Teams looking to acquire and extend players at the deadline would be informed of at least a general range of salaries a player could be after. This would also help front offices when planning for the off-season and the expected amount of money they'll have left to spend. There are lots of benefits to this. However, even with the new wave of performance analytics, teams continue to hand out "questionable" and "surprising" contracts when compared to these analytics. There's lots of reasons why this may be, including but not limited to the following variances by team; size/strength of analytics department, the analytics being used, the influence analytics has in the organization, desperation for team improvement to increase job security and the personal income tax of the city. Some General Managers may even make gut feelings that don't necessarily have anything to back it up. As some of this may be unmeasurable, they will also be unpredictable which means they can't be accounted for in machine learning. My goal with this paper was to use machine learning to accurately project if a player would receive a minimum contract, project the length of the contract and project the cap hit percentage a player would receive. This project was in progression before the announcement of this challenge but the analysis is unique to this paper. I have been doing contract projections since February 2024 and during that time, I have refined my models many times, including changing the algorithms themselves before leading me to the current edition from June.
2. Dataset
The dataset I used was merged from a few different sources. I sourced tracking statistics from MoneyPuck and contract information from CapFriendly and PuckPedia.
This section of the paper has been shortened to protect sensitive information.
3. Methods
Predictions on contract length and cap hit percentage were done with the Cubist Algorithm. The prediction for if a player received a minimum contract was done using the Random Forrest Algorithm.
This section of the paper has been shortened to protect sensitive information.
4. Results
I have used many ways to measure the performance of my model to include many different ways of evaluation as each one has different strengths and weaknesses. Some of the ways I've measured error include absolute error, actual error, percentage difference, and I also divided absolute error and actual error results into groupings by signing status, age and team. As of August 13th, 2024, when measuring cap hit error I am on average, $346,995.11 or 20.12% off for every NHL player to have signed an NHL contract. These results are extremely competitive with the likes of AFP Analytics and Evolving Hockey who have many years of experience doing this. I will further break down the strengths and weaknesses of each error measure below.
Absolute Error
Absolute error is measured by taking the absolute value between two measurements. For the purpose of this paper, this is the most convenient and straightforward choice when measuring error equally across over projections and under projections.
Length
The overall length error for my projections is 0.80. This can be interpreted as my projections being off on length by 0.8 years on average for each contract signed. This number isn't very competitive in this space as others get as low as 0.5 years. It was most accurate on the contract length of goalies as they had an error of 0.70, while forwards came in at 0.77 and defenseman at 0.89.
Cap Hit
The overall cap hit error as mentioned above was $346,995.11. This can be interpreted as my projections being off on cap hit by $346,995.11 on average for each contract signed. This number is extremely competitive in this space as others get as low as $350,000. It was most accurate on the contract length of forwards as they had an error of $310,811.16, while goalies came in at $364,239.93 and defenseman at $400,974.80. This way of measuring contracts serves good as an overview but doesn't factor in the difference between being off by $350,000 on a $1,000,000 contract vs being off by $350,000 on a $10,000,000 contract.
Actual Error
Actual error is measured by taking the difference between two measurements. In this paper, I took my projection subtracted by the actual value for players signed through free agency only. Measuring the value of this on length didn't prove to be too useful but was included in the spreadsheet nonetheless. This calculation was mainly used for cap hit when evaluating my projections performance by team. By summing the actual error by team, I am able to identify if there are any teams getting significant discounts or overpaying.
The teams that my model has getting the biggest discount is the Carolina Hurricanes, followed by the Florida Panthers, Colorado Avalanche, Edmonton Oilers and St Louis Blues. The teams that my model has overpaying the most overall is the Chicago Blackhawks, followed by the Buffalo Sabres, Calgary Flames, Utah HC/Arizona Coyotes and New York Islanders.
Absolute Percentage Difference
The Absolute Percentage Difference measures how far off the prediction is from the actual value as a percentage as opposed to the absolute error. This number as an average across every signed contract is a better measure of error than absolute error because it accounts for the size of the error in comparison to the size of the contract. As mentioned earlier, there is a large difference between being off by $350,000 on a $1,000,000 contract vs being off by $350,000 on a $10,000,000 contract. This measure takes that into account.
The overall mean average cap hit difference is 20.12% and can be understood as the predictions being off by 20.12% on average for every contract. I believe this to be a good number as it roughly equates to a $200,000 error per $1,000,000 of actual contract value.
5. Conclusions
As mentioned previously, I summed the actual cap hit error by team and from those results I was able to draw some conclusions. As a reminder, I was the furthest over on Carolina, Florida, Colorado, Edmonton and St Louis, and the furthest under on Chicago, Buffalo, Calgary, Utah/Arizona and New York Islanders. Excluding the Islanders, these results say one thing: players will take a discount to join a winning team. Historically these results make sense, winning the Stanley Cup is the ultimate goal and a free agent's best chance to do that is by joining a team they think will give them the best chance to win. Some players may still end up on this route by signing a 1 year deal with a higher salary on a bad team and ultimately get traded to a contender at the end of the season (see John Klingberg 2022-23). This year, it could also make sense with Alec Martinez signing a 1 year contract for almost $3 million more than projected with Chicago. Contracts like these can be explained by teams enticing players by offering a larger salary in place of giving them a chance to win. As a result, this will lead to these teams generally overpaying to secure a player's services. There are players who opt for this route as they may not value winning as highly as they would value financial security. Knowing the team a player signs for could have a big impact on expected contract value, especially factoring in the personal income tax of the state/province, the signing team and the signing team's cap space in addition to the signing team's winning percentage. When armed with all the knowledge, some contracts/outcomes become predictable. For example, Jason Spezza joining the team he grew up cheering for was not a surprise, and neither was the league minimum contract he signed to join them. However, the model has no way to know that Jason Spezza was a Leafs fan as a kid. It is because of this flaw that these numbers are to serve as a guide and not the law. Another thing that was very evident was the idea that teams with low or no personal state income tax have an advantage over teams with higher personal income tax rates. To measure this, I used the total actual error for each team as described above. From this, the results do support the idea as Florida, Dallas, Tampa Bay and Carolina were 4 of the top 10 teams that got the biggest discounts on the contracts they signed. Three Canadian teams made the top 10 as Edmonton, Toronto and Winnipeg were all featured. Toronto and Winnipeg make up two of the worst five tax rates in the league and both being top 10 on this list does not support the idea that favourable tax rates play a difference. Vancouver who also has a top 5 worst tax rate just misses the cut at 11 on the list. However, the top half of this list has an average tax rate that's 8% better than the average tax rate and the bottom half has an average tax rate 4% worse than average tax rate. Within the top 19 teams that got the biggest contract discounts, every single team was in the top half of the league in either winning percentage from the season prior or tax rate. From these results, it is safe to say that both personal income tax rate and a winning organization play important roles in the value of the contracts that get signed in free agency.