AI pipeline that finds shipping photos in job folders, uses CNN to determine if image is a detail image or a full tank image, and centralizes them for fast reuse.
Result: Processed ~15,000 photos across 823 jobs (2019–2025). Staff can pull reference images in minutes instead of hours; originals preserved; re-run safe.
Tech used: Python, PyTorch (ResNet18), torchvision, Pillow, pandas, pathlib, scikit-learn (kNN baseline), OpenCV (early preprocessing).
My role: Data scientist / workflow engineering — built crawler, trained CNN (upgraded from kNN), added copy-only pipeline with logging and duplicate avoidance.
Tech used: Python, PySpark (MLlib), pandas, matplotlib, seaborn, SciPy, Google Colab
📦 Includes: VectorAssembler, GBTRegressor, RegressionEvaluator, and statistical testing (ks_2samp)
🏎️ What’s a Constructor?
In Formula 1, a constructor earns points based on both of its drivers' results. This analysis uncovered which metrics — like grid position and lap consistency — best explain overall team performance.
What it solves: Identifies which factors matter most in determining a team’s success in Formula 1. Goes beyond lap times to uncover constructor strategy drivers.
Tech used: R, caret, ensemble models, tidyverse, constructor-level modeling
What I built: A clean, replicable A/B test using Python. Included proper test group separation, t-tests, and recommendation logic based on statistically significant metrics.
Tech used: Python, pandas, scipy, statistical testing
🟡GitHub