The image above shows the differences in the predicted wins vs. the actual wins based on goal percentage, using games played in the regular season.
The first three columns indicate the team, how many goals they scored, and how many goals were scored against them. The fourth column is the number of wins they were expected to win using the Pythagorean Win Expectation formula. The fifth column is the number of wins the team actually had during the regular season. Finally, the sixth column is the difference between the predicted and actual.
The Formula:
The formula for Pythagorean Expectation was originally applied to baseball using:
(runs scored^2/(runs scored^2+runs allowed^2))
When applying this metric to other sports, the squared needs to be changed based on the sport. Chris Fry calculated it to be 1.05 back in 2019. I recommend you check out his article on finding the exponent.
The Accuracy:
This season ~ 33% of teams did better than expected, 21% did as expected, and 43% did worse than expected.
Overall, 59% of the predictions were within one game of expected and 90% of the predictions were within 2.
As seen in this graph, the correlation between the predicted and actual win percentages this season was almost completely linear.
The resulting r-squared is 90%, meaning that the Pythagorean win prediction accounts for 90% of the variation in observed wins.
Standout Team: Liberty
Liberty lost three more games than expected, which might be seen to some as surprising given their Big East championship and their NCAA tournament spot.
Liberty had a goal differential of +41 this season, which mainly came from three specific games. The first was a 9-0 win against Kent State, the second was an 11-0 win against Queens, and the third was a 10-0 win against Georgetown. This is an extra +30 goals to their differential, causing their prediction to be very high.
In actuality, when they played teams of similar strength to them, the games typically were either 1 or 2-goal differences which are typical for field hockey.
The reason why they did so much worse than they were predicted to was because of the inflation of their goal differential by playing easier opponents.
Other Notes:
The correlation between the number of goals and the number of wins is not as linear as the goal differential and the number of wins. This is due to the fact that both goals for and goals against are accounted for in the formula.
Of the teams that did better than predicted, only six teams had a negative goal differential and one team had a differential of exactly zero. Of the teams that did exactly how they were expected, eight had positive goal differentials, nine had negative differentials, and one had a differential of zero. Of the teams that did worse than expected, twenty-two teams had negative differentials, eleven had positive differentials, and three had differentials of zero. Goal differential and games won have an r-squared of 86%.
Closing Thoughts
The exponent of 1.05 was fairly accurate in predicting the win percentage of each team this season.
The Pythagorean win percentage is a fun way to calculate the number of games a team is predicted to win each year. However, it is limited to a small input that does not show the full scope of a team. When the high-ranked teams play lower-ranked teams, the high goal differential can inflate their predictions since the formula does not take difficulty into account. There is most likely a calculation that includes the difficulty of each game into account for a better prediction model.