Below are links to a group of selected projects I've completed, some while learning data science. These are unrelated to my full-time teaching position.
Some SQL Reports
These are some online projects I've sought out to practice and demonstrate my ability to use SQL.
NYC Public School Test Scores Analysis Really basic stuff, just getting some demonstration up here.
NBA Top Shot Flash Challenge Projections
Description: Flash Challenges are a fun aspect of NBA TopShot, where collectors can gain a competitive edge by having a model to help identify which players have the most value. This document demonstrates how I come up with my competitive value model.
Description: This RPubs document messily demonstrates the types of querying I regularly do in R on just about a daily basis.
MR Play Daily Projections Output
MR Play Daily Projections Script (Updated 10/27/22)
Description: MR Play is a daily fantasy sports contest where players can enter their NBA TopShot NFTs into a contest for winnings. I created a model to predict player points in MR Play incorporating full season game logs, last 10 game statistics, and daily projections from a paid subscription website. The model has led to personal success in playing the game along with other players inquiring if they could pay to subscribe to exclusive access of my projections. The script is run in R and the results are exported to Google Sheets for others to access easily.
2018 March Madness Predictions
Description: Inspired by Kaggle’s 2018 NCAA March Madness Contest, I created a prediction model for the results of the 2018 NCAA Men’s Basketball tournament against the spread using a gradient boosted model and am keeping track of the results. This document kept a running track of all the predictions and the model’s accuracy.
Recreational Basketball League Salary Analysis ("B" League)
Recreational Basketball League Salary Analysis ("A" League) (Very casual, non-professional language)
Description: CACBasketball (cacbasketball.com) is one of Boston's most popular recreational basketball leagues offering 20+ leagues every season for over 1,000 players. They host leagues at several competition levels where players enter the league as a free agent and team captains fill their teams by bidding on players with $70 of total cap space available to field a full team. Since some captains are better players than others, the league director needed a way to fairly assign salaries to captains based on their stats. These documents show the regression models created and summarizes the results and recommendations down in language readable for non-statisticians.
Description: This app was completed as a capstone project in the Data Science Specialization through Johns Hopkins University. We were handed 3 very large data sets of messy textual data from the internet and without any lecture guidance were told to scour the internet to teach ourselves enough about the field of Natural Language Processing (NLP) to create an app that acts as a next-word completion product similar to how a cell phone does. This app functions well on mobile devices.
A Random Forests ML Model Based on Motion Tracking Device Data
Description: This was a course project for a course titled Practical Machine Learning. The goal of the project was to develop a Machine Learning model that uses a provided dataset (19,000 observations of 53 potential predictor variables) from motion tracking devices on a user doing a weightlifting exercise and predicts if the user was doing the exercise with proper form or identifies what kind of mistake the user is making. I built my model using random forests and achieved 99.1% accuracy on a test set I partitioned and scored 20/20 on a validation set withheld from my access.
Quantifying the Effect of Auto Transmission on Fuel Efficiency
Description: This was a course project for a course titled Regression Models. The goal was to create multivariate regression models to determine and quantify which properties of a car (like horsepower, weight, transmission) have the greatest effects on fuel efficiency. This paper analyzed the cars within each transmission class (automatic/manual).
Analysis of NOAA Storm Database
Description: This was a course project for a course titled Reproducible Research. The goal of the project was to take a very messy and fairly large dataset (900,000 obs, 37 variables, 561Mb) of the entire NOAA Storm Database from 1950-2011 and transform it for analysis of which kinds of storms are most harmful to human life and in cost in dollars. I have since learned more elegant methods of cleaning than the heaps of brute force manual coding used in this project.
Aggregating, Cleaning, and Summarizing Samsung Motion Tracking Data
Description: This is the github repository of a project for a course titled Getting and Cleaning Data. The goal of this project was to take numerical data from 6 .txt files and process them into one tidy data set of summary data. There was a lot of incomplete/missing data and there were many variables that were named poorly/obscurely.
Exploring Household Electric Power Consumption
Description: This is the github repository of a project for a course titled Exploratory Data Analysis. The goal of this project was to explore and perform basic analysis on a data set containing 2 million observations of 9 variables including plotting and working with date and time data.