· EDA of Netflix Viewership - Conducted Exploratory Data Analysis on Netflix viewership, analyzing number of viewers, ratings, categories of 10K+ movies and shows using Pandas, NumPy, and Matplotlib.
· EDA of Unicorn Startups - Performed Exploratory Data Analysis (EDA) on 500+ unicorn startups over the past decade, investigating their growth trends across countries and industries. Developed an interactive Tableau dashboard for visualization.
· Learning Platform Database Design - Defined entity relationships, normalized logical & physical schemas. Developed SQL scripts & stored procedures to load & manipulate more than 100K data records related to more than 1K peer to peer learning courses helping students learn from each other.
· Retail Datawarehouse Design - Established centralized data management for a retail supermarket by architecting an analytical data warehouse. Evaluated storage and compute costs across GCP, AWS, and Azure, implementing hybrid and multi-cloud strategies for security, compliance, scalability, and high availability. Created optimized ETL pipelines for batch and streaming data integration using Dataflow, Big-Query, Glue, and Redshift, resulting in the delivery of 50+ automated business reports.
· Wildfire Analysis - Analyzed 10GB of USAF weather station data to predict wildfire risks using MapReduce jobs in Python and Java. Compared performance on both on-premises Hadoop and AWS EMR clusters, resulting in a 40% increase in preventive control efficiency.
· Python Library - Designed an extended Exhaustive Search algorithm to evaluate and select optimal regression models using MAPE as a performance metric, resulting in reduced computational complexity and expedited results delivery.
· Flight price prediction – Developed ML model using multiple linear regression and neural network to predict flight prices with 85% accuracy leveraging Scikit-Learn and applied Exhaustive Search algorithm to select the most influential predictors.
· NYC jobs classification - Created a career level classification model with 75% accuracy with logistic regression and neural networks using Scikit-learn for NYC job data.
· Stock Price Prediction - Conducted time-series analytics to forecast stock prices with 70% accuracy, employing Winter-Holt's, ARIMA models in Python and R. Implemented a two-step forecasting approach to handle data inconsistencies.
· Child vs adult classification - Designed a Keras sequential CNN deep learning model for binary classification of human images (child or adult) with 80% accuracy. Employed data augmentation techniques to address limited training data.
· In-Out service optimization - Developed a queue optimization model to reduce the wait time by of students by 10 mins at the new In-n-Out outlet near CSUEB campus.