I generated a custom run expectancy matrix and calculated wOBA weights using the past four years of NECBL data. My process adapted methodologies outlined by Robert Frey in his work on Division I baseball, while also addressing gaps in the available TrackMan data. Since base-out states are not tracked, I simulated them using advancement percentages sourced from FanGraphs. This approach allowed me to create league-specific run values that accurately reflect the true offensive impact of each event, forming the foundation for advanced performance metrics and more precise player evaluations.
Historical Data (Training and Testing): 2021–2024 NECBL TrackMan data
Current Season Data (Application): 2025 Bristol Blues TrackMan data
Split: 80% training / 20% testing for model evaluation
Cleaning Steps:
Removed events irrelevant to the target metric (e.g., stolen bases for OBP, walks for SLG)
Corrected misspellings and inconsistent naming
Filtered to only the final pitch of each plate appearance
Encoded categorical variables (e.g., stadium) and aligned factor levels between training and test data
Model Inputs: Exit velocity, launch angle, batted ball direction, distance, stadium
Algorithm: Random Forest (10-fold CV, mtry tuned at 2/4/6/8), ensemble of 5 models (300 trees each) for stable predictions
Special Handling:
Walks & HBPs assigned OnBaseFlag = 1
Strikeouts assigned OnBaseFlag = 0
Performance (Test Set):
Confusion Matrix Accuracy: 90.7%
Comparison RMSE: 0.0603 | MAE: 0.0471 | R²: 0.7465
Brier Score: 0.0718
Model Inputs: Same as xOBP
Special Handling:
TB encoding: 1B = 1, 2B = 2, 3B = 3, HR = 4
Walks/HBPs excluded; strikeouts assigned TB = 0
Performance (Test Set):
Accuracy: 80.1%
Comparison RMSE: 0.1292 | MAE: 0.0977 | R²: 0.6352
Multi-Factor Brier Score: 0.2919
Calculated as xOPS = xOBP + xSLG
Gives a holistic, expected measure of offensive contribution.
Model: Same RF ensemble framework as xOBP/xSLG, but the target is wOBA value per PA.
Special Handling:
Strikeouts = 0
Walks & HBPs assigned weighted values from wOBA weights
Performance (Test Set):
Accuracy: 80.1%
RMSE: 0.3719 | MAE: 0.2624 | R²: 0.5415
Comparison RMSE: 0.0737 | MAE: 0.0560 | R²: 0.6560
Brier Score: 0.2908
Uses xwOBA model but only ABs ending in contact.
To provide both players and coaches with a clear, data-driven look at individual offensive performance, I built an automated weekly hitter report. This report is emailed to every hitter and the coaching staff, ensuring consistent feedback while removing the need for manual stat compilation.
Each report is player-specific and incorporates both traditional stats and advanced batted ball analytics, paired with visualizations to make insights immediately actionable.
Core Report Features
Hot and Cold Zones
Heatmaps for hit types, outs, and strikeouts, showing results by pitch location.
Helps hitters identify productive zones to attack and zones to avoid chasing.
Exact Pitch Location Charts
Scatterplots of the pitch location for each hit type, out, and strikeout.
Uses color-coding to differentiate outcomes and highlight patterns.
Hit Location Mapping
Spray charts for each hit type and outs, illustrating batted ball distribution across the field.
Allows quick recognition of pull/opposite field tendencies.
Weekly Statistical Summary
Traditional stats: AVG, OBP, SLG, OPS.
Contact quality: AVG Exit Velo, Max Exit Velo, and Exit Velo by hit type.
Plate discipline metrics: Swing%, Whiff%, and Chase%.
Splits & Situational Breakdowns
Handedness splits vs. LHP and RHP.
Count splits (0-strike, 1-strike, 2-strike situations).
Swing metrics by count — swing%, whiff%, and chase% within each count state.
Exit Velocity Analysis
Average EV for each hit type (1B, 2B, 3B, HR).
Weekly max EV highlight.
Swing/Whiff/Chase Trends
Overall season-to-date trendline for plate discipline.
Weekly breakdown to compare recent form vs. season average.
Data Source: TrackMan exports from each game.
Processing: An RMD file cleans and joins datasets, calculates advanced metrics, and generates all visualizations.
Report Generation: RMarkdown produces individual PDF reports for each hitter.
Distribution: The file automatically emails each hitter their personalized report and CCs the coaching staff.
This system ensures every hitter receives objective, individualized feedback on their week, helping them make immediate, data-backed adjustments while providing coaches with a quick reference for lineup and training decisions.