This work is my Statistical Science Master's Thesis, where I evaluate MLB umpire performance using publicly available pitch data, under the advisement of Paul A. Parker, from the UCSC Statistics Department. In this work, I combine Neural Networks (Extreme Learning Machine) with a Bayesian Statistical Model to predict an umpire's called strike zone with uncertainty.
This work is a Bayesian Nonparametrics project making inference on NBA shot chart data. This data is publicly available, and the goal is to estimate a player's shooting tendencies, in terms of where they are likely to shoot, and how likely they are to make a shot given its location. The model consists of a Dirichlet Process mixture model on the intensity surface of a marked non-homogeneous Poisson point process.
Our data set contains college basketball game data which we clean and modify into a data frame of seasonal-cumulative team box score statistics. PCA and clustering methods reveal what features separate teams of similar or differing performance. The visualizations of feature coefficient values per season reveal that the statistics most important to winning games have remained relatively constant over the last 21 years.
What can we learn about what team-building choices performed well in a competitive Pokémon tournament? In this work, I take in an entire tournament’s LabMaus team sheet data and produce unit-level contributions of individual team choices towards end-of-tournament success.