This project develops a machine learning model to predict which newly created GitHub repositories will become successful within six months, using only activity data from their first 30 days. The model achieves strong predictive performance and provides actionable insights for technology scouts, open-source program offices, and investors seeking to identify promising open-source projects early in their lifecycle.
Key Results:
Trained and evaluated 6 classification models on 34,000+ GitHub repositories
Identified early signals most predictive of project success (stars, contributors, issue activity)
Developed a deployable model for scoring new repositories in real-time
This project analyzes commercial real estate valuation trends in Philadelphia using the city's comprehensive open property assessment data. In the post-pandemic era, commercial real estate faces unprecedented challenges:
Office vacancies reached record highs (~19.6% in the U.S. in Q1 2025) due to hybrid work adoption
Office property values dropped ~14% in 2024 with further declines expected
Rising vacancy rates and declining values increase default risks on commercial mortgages
This project develops a machine learning model to assess credit risk and predict loan defaults. Using the German Credit dataset, we build classification models that help financial institutions make informed lending decisions while managing risk exposure.
Key Results:
Built and evaluated 6 classification models on 1,000 credit applicants
Engineered risk-based features from 20 applicant attributes
Implemented cost-sensitive classification for business optimization
Achieved strong predictive performance with interpretable SHAP analysis
This project develops a machine learning model to predict customer churn in e-commerce, enabling proactive retention strategies. Using transaction data from a UK-based online retailer, we engineer RFM (Recency, Frequency, Monetary) and behavioral features to identify customers at risk of churning.
Key Results:
Best Model: LightGBM with 76.77% ROC-AUC and 75.5% recall
Built and evaluated 6 classification models on 3,370 customers
Engineered 24 predictive features from 397,884 transactions
43% churn rate identified in the dataset
Identified key churn indicators: Cancellation behavior, Frequency, and Purchase Diversity
This project analyzes production data from the Equinor Volve oil field (2008-2016) in the Norwegian North Sea to build predictive models for:
Production Forecasting: Time-series models to predict future oil, gas, and water production
Anomaly Detection: Identifying abnormal operational patterns that may indicate equipment issues or production problems
In the oil & gas industry, unplanned downtime is extremely costly:
Upstream companies face ~27 days of downtime per year costing $38 million
A single 3.65-day outage can cost over $5 million
Predictive maintenance using ML can save hundreds of thousands of dollars per hour of prevented downtime