📈MACHINE LEARNING: FROM ENGINEERED FEATURES TO FORECASTS

Built predictive models that explain variance, surface key drivers, and deliver interpretable insights.

🟡 GitHub Repo

SavTankVision — AI Classifier for Co.'s Photo Archive

AI pipeline that finds shipping photos in job folders, uses CNN to determine if image is a detail image or a full tank image, and centralizes them for fast reuse.

Result: Processed ~15,000 photos across 823 jobs (2019–2025). Staff can pull reference images in minutes instead of hours; originals preserved; re-run safe.

Tech used: Python, PyTorch (ResNet18), torchvision, Pillow, pandas, pathlib, scikit-learn (kNN baseline), OpenCV (early preprocessing).

My role: Data scientist / workflow engineering — built crawler, trained CNN (upgraded from kNN), added copy-only pipeline with logging and duplicate avoidance.

🏘️ Real Estate Price Prediction

Result: GBT explained 71% of the price prediction variance across 2M+ home listings

Tech used: Python, PySpark (MLlib), pandas, matplotlib, seaborn, SciPy, Google Colab

📦 Includes: VectorAssembler, GBTRegressor, RegressionEvaluator, and statistical testing (ks_2samp)

🟡GitHub

🏎️ What’s a Constructor?
In Formula 1, a constructor earns points based on both of its drivers' results. This analysis uncovered which metrics — like grid position and lap consistency — best explain overall team performance.

🏎️ F1 Constructor KPIs

What it solves: Identifies which factors matter most in determining a team’s success in Formula 1. Goes beyond lap times to uncover constructor strategy drivers.

Tech used: R, caret, ensemble models, tidyverse, constructor-level modeling

🟡GitHub

UX optimization using statistical testing

🎯 Customer Service A/B Test

What it solves: Designed to demonstrate how a UX design change impacts customer behavior, A/B testing logic and statistical rigor.

What I built: A clean, replicable A/B test using Python. Included proper test group separation, t-tests, and recommendation logic based on statistically significant metrics.

Tech used: Python, pandas, scipy, statistical testing
🟡GitHub

Page updated

Google Sites

Report abuse