Baseball has always been known for balancing tradition and stats; machine learning is taking it into an entirely new realm. It began with Moneyball, which was about finding value in players who had been overlooked using stats; now, we have surpassed even that: algorithms that allow a team to forecast how a player will perform before they even step in to the plate. I am a longtime fan who has spent potentially hundreds of nights reading box scores, taking classes in data analysis, statistics, and probability to build my forecasting models, and now I enjoy the games even more because of all the technology that is redefining the game I love. It's no longer just stats; now it is forecasting the future.
The Moneyball phenomenon showcased the power of data to overpower gut instinct. This trend has spilled out of sports and into the business world, as mentioned by Forbes. The Oakland A’s way back in 2002 basically used statistics to identify players whose skills were undervalued by traditional measures. Today, machine learning is taking that concept and fully realizing it. With products like Statcast now generating seven terabytes of data every game, machine learning algorithms analyze things like spin rates and sprint speeds. This is much more complicated than just crunching numbers—it’s changing and learning from each pitch and swing, as cited in an article from a MIT Sloan explainer. Teams now have a magic eight ball, and it’s computer code.
So how does this all work? Machine Learning sifts through mountains of data--historical statistics, weather conditions, even the sleep patterns of a player--to find patterns that no human is going to be able to find on their own. A Medium post details how a coder demonstrated it in the case of predicting MLB game results with incredible accuracy by using logistic regression. For a player, this is very granular: Will a pitcher's fastball fade in August? Will a hitter maintain their launch angle?
Here’s what ML delivers:
Pattern Recognition: Spots a batter’s kryptonite, like sliders low and away.
Real-Time Updates: Refines forecasts with fresh data, like post-injury recovery stats.
Scalability: Processes terabytes of info faster than any scout’s notebook.
Depth: Combines traditional stats with biometrics for a 360-degree view.
The impact is undeniable. Teams like the Dodgers and Rays are applying the methods of Machine Learning to draft smarter, manage workloads, and play with better lineups, all of which was referenced in a SABR article. Statcast data is being deployed to provide information for the model outputs, such as predicting pitch outcomes or projecting a prospect's ceiling years away. The Medium example even suggested broader usage applications such as win probability mid-game, or especially the changing probability of winning (or losing) for a given team. There is one downside to this. Any model will become faulty if the data feeding it is faulty. For example, with deficient innings logs for player injuries, team-provided data becomes questionable, and can render model outputs questionable as well. The author of the Medium model suggests that another issue is over-fitting the model (model outputs being hugely influenced by past trends). Proponents of the human element know the thinking in their gut has just as much, if not more, value than likely outcomes the model suggests. Some coaches will always trust those factors over a "black box" providing probable outcomes, which speaks to the tension in the conversations on this topic.
What’s over the horizon? Wearables that track heart rates, biomechanics, etc., will feed ML more interesting datasets to predict injuries or peak seasons with uncanny accuracy. The SABR article envisions us in 2030, where AI will completely redefine scouting. Picture a front office that knows the potential of a high school prospect to make it to the MLB before that kid gets a hit in his first at-bat in pro baseball. Technology is cool, but for me, it is about a career goal. I want to be the analyst who marries data and baseball intuition to find the next MVP. Machine learning will not replace the human element; it will enhance the human element by marrying the science to the heart and soul of baseball.