Defensive 1st Baseman

R Code for my Defensive Rating Model

library(tidyverse)

library(stringr)

library(ggplot2)

library(baseballr)

library(sjPlot)

library(vip)

# loading in the two CSV for usage, data from Fangraphs for fielding statistics (2022-2024, filter on 1B data)

sheet_1 <- read.csv("C:/Users/Drew Duffy/Downloads/MLB Def for Phillies (as of 8_15_24) - picks.csv")

sheet_2 <- read.csv("C:/Users/Drew Duffy/Downloads/MLB Def for Phillies (as of 8_15_24) - split.csv")

#joining the sheets on season and names

clean <- left_join(sheet_1, sheet_2, by = c("Season", "name_last", "name_first"))

#loading previous error data I have used

load("first_base.RData")

#renaming to season for a join

clean <- clean %>%

rename(game_year = "Season")

#joining the two datasets

total <- left_join(clean, first_base, by = c("game_year", "name_last", "name_first"))

#picking the columns I want to look at

total <- total %>%

select(game_year, name_last, name_first, G, wpa, age, OAA, DRS, FRV,

Scp, FE, TE, PO, A, avg_xwoba, sum_run_exp)

# assigning weights to each commonly used metric. WPA gets a x2 due to it's small values

def_rat_weights <- data.frame(

weight_OAA = c(".4"),

weight_DRS = c(".4"),

weight_wpa = c("2"))

#changing all from string to numeric

def_rat_weights <- def_rat_weights %>%

mutate(weight_OAA = as.numeric(weight_OAA),

weight_DRS = as.numeric(weight_DRS),

weight_wpa = as.numeric(weight_wpa))

#converting the values into new columns with their weighted value

weight_OAA <- def_rat_weights$weight_OAA

weight_DRS <- def_rat_weights$weight_DRS

weight_wpa <- def_rat_weights$weight_wpa

total$weighted_OAA <- total$OAA * weight_OAA

total$weighted_DRS <- total$DRS * weight_DRS

total$weighted_wpa <- total$wpa * weight_wpa

# Duffy Defensive Rating

ddr <- total %>%

mutate(duffy_d_rating = (weighted_OAA + weighted_OAA + sum_run_exp)) %>%

filter(!is.na(duffy_d_rating))

#finding the 3rd quartile to establish an above average threshold (that is well above 50%)

summary(ddr$duffy_d_rating)

abv_avg <- quantile(ddr$duffy_d_rating, 0.75)

# Add a binary classification based on the above average limit

ddr <- ddr %>%

mutate(is_above_avg = ifelse(duffy_d_rating > abv_avg, 1, 0))

#creating a linear model for above average first base defenders and seeing the trend for age

ddr_model <- lm(is_above_avg ~ weighted_OAA + weighted_DRS + weighted_wpa + age,

data = ddr)

summary(ddr_model)

plot_model(ddr_model, type = "pred")

Goal: Create a more composite way to view defensive performance on positional levels. I created a weighted metric that uses OAA, DRS, and weighted errors, my way of quantifying the impact of an error on a certain play

Process: I found data from FanGraphs on first basemen for 2022-2024 (as of August 15) seasons for OAA and DRS and used Statcast data for their error data.

After finding relevant columns to extract insights from, I then began work on their composite score. Since errors represent a small number of plays and its number values are so small to begin with (representing percentages), I used their multiplier as .2 while OAA and DRS both are .4 multipliers.

I set a threshold for the model at .75 which determined an "above average" defender. Then I ran the model to see which variables contribute most to that threshold. Not surprisingly, the OAA and DRS metrics have the most prominent impact (since they are weighed the highest). I also added age into the model, but nothing substantial came from it which is telling for first basemen as they tend to be a bit older than their infield counterparts. This bodes well for the position, as age is not a great determinant of wether the player trends toward becoming a better or worse fielder over his years.

Here, we see that Christian Walker is in fact the top defensive first baseman in the game and has been one of the best overall defenders in the past three seasons.

I am interested in the progression and evolution of Nathaniel Lowe as a player. He has been one of the only players to consistently improve his defensive rating year to year, wonder if it continues in the future.

Model for Defensive Rating Outputs

Page updated

Google Sites

Report abuse