In this project, we created code to evaluate baseball pitchers based on specific game statistics. We used the Lahman Database which is a compiled list of all pitchers that have played in Major League Baseball by season. The link to the database in Matlab is included below:
https://drive.matlab.com/sharing/5cae3d76-8a8e-475b-bc49-e74d3b9f6e2a
This link will take you to the GitHub repository for the project: https://github.com/campermatt589/Matlab_Projects/tree/main
PitcherTask1: The Pitcher_Table_Creation file extracts the pitching csv file. We cleaned and aggregated the data to create two tables. SeasonPitching compiles statistics for each player from each of the seasons they played. CareerPitching compiles pitching statistics from pitchers throughout their careers. We do that by matching the player with the playerID. I cleaned up the data by making sure the variable names were properly displayed and written. It ends with saving a new file for both season and career statistics to be used in the next code file.
PitcherTask2: The Baseball_Pitchers file starts with loading the seasonPitching and careerPitching files. We wanted to label each row of statistics with the name of the player who put up those numbers. I extracted the People.csv file from the Lahman Database. I created a names table to join with the seasonPitching and careerPitching files. We joined them by matching the playerID from both the names table and each of the pitching tables. I wanted to save a new set of files so I created a careerPitching3 and seasonPitching3 to be able to use each file for a different purpose. I created the greatPitchers table and saved them to both files before we continued on.
We started using the greatPitchers table with the seasonPitching3 file. I wanted to find young pitchers and identify trends in their statistics, more specifically, Earned Run Average (ERA). I modified the table to add a column that shows the season that a player is in. I refined the table to include only pitchers from 2016 or later. I filtered pitchers who came into the league in 2016 up until 2022. Since we wanted to go after a pitcher in free agency, we want to find a pitcher that started at least 10 games each season.
We also wanted to get the careers of those pitchers within that 2016 and 2022 time period. I got rid of the "sum_" from each column and removed the columns that were not needed for this project (i.e., sum_yearID). I also recalculated innings pitched (IP) and Earned Runs Average (ERA) to reflect the newly combined numbers for each player. To judge every pitcher's statistics, I got the unique list of pitchers from the Names column. I created a dropdown menu for every pitcher in the greatPitchers table within the stated years. By doing that, we can plug in a player that we want and also get the stats from that player from the given year range. The final thing was to create a chart that not only shows the differences between the two pitchers, but also how they compare to the median of a stat for a particular season. I created a table to get the median ERA for each season from 2016 to 2022. I retrieved the ERAs from each player for each season and plugged it in along with the median ERA.
PitcherTask3: The purpose of this line of code is to practice with several specific conditions. In this file, we do two tasks. We first find starting pitchers that have can anchor a rotation. Meaning, we need to find a starter that has a significant amount of starts in a season and also retrieve their ERA. The second task is to find potential pitchers for the bullpen. So we need to find a pitcher from recently, who appeared in a particular amount of games. We also want to check for ERA and also the strikeout (SO) count. After the project, it was also found that finding the strikeouts per 9 innings ratio would be a good way to find a reliever for close games.