Built an end-to-end NLP classification pipeline to detect disaster-related content in tweets. Improved model accuracy with advanced preprocessing, feature engineering, and hyperparameter tuning.
Tools Used: Python, scikit-learn, NLP libraries (NLTK / spaCy), Transformer embeddings
Highlights:
Handled messy, short-text data: removed noise, emojis, stopwords; tokenized and lemmatized
Engineered features like TF-IDF and contextual embeddings to strengthen signal
Tuned model (ensemble / transformer-based) to push classification metrics above baseline
Built and deployed a multi-task NLP pipeline that handles topic detection, sentiment analysis, and summarization. Designed for scalability on real-world text streams.
Tools Used: Transformers (Hugging Face), Python, Streamlit, NLP preprocessing (tokenization, cleaning, etc.)
Highlights:
Deployed as a Hugging Face Space for easy access & demoing capabilities Hugging Face
Supports zero-shot topic classification → no need for labeled training data per topic
Combined with sentiment & summarization to give layered understanding of input text
Created an interactive transformer-based app that converts text into emoji-rich output, with customizable style. Served as rapid prototype & demonstration of fun, user-facing NLP deployment.
Tools Used: Hugging Face Spaces, Python, Transformers, Streamlit, NLP preprocessing
Highlights:
Deployed as a public Hugging Face Space
Supports user input → text transformation to emoji with style/customization
Demonstrates quick iteration & prototyping of NLP models in a UI context
Built a full-cycle e-commerce analytics system simulating real-world scenarios: churn prediction, segmentation, revenue forecasting, and retention dashboards. Achieved ~98% accuracy & ~0.99 ROC-AUC on churn prediction, plus meaningful user persona segmentation and revenue forecasts with seasonality/holidays.
Tools Used: Python, SQL, scikit-learn, XGBoost, LightGBM, Prophet, SHAP, Streamlit, plotly, matplotlib, seaborn
Highlights:
Identified “Loyalists,” “Sleepers,” and “At-Risk” personas using K-Means clustering
Built interactive dashboard to explore feature importance, churn metrics, and revenue forecasts
Captured holiday/seasonal effects in forecasting using Prophet
Built predictive models to understand and reduce customer churn using logistic regression and Random Forest Classifiers, achieving 85 %+ accuracy.
Tools Used: Python, Seaborn, Scikit-learn
Highlights:
Identified customer segments with the highest churn risk
Delivered targeted retention strategies based on model outputs
Automated web scraping and data cleaning pipeline to analyze book-selling data from Wikipedia and other sources.
Tools Used: Python (requests, BeautifulSoup), Pandas, Matplotlib
Highlights:
Scraped 14 pages, consolidated 200+ rows of data
Performed missing data analysis and visualized gaps
Analyzed demographic trends across India using population data to uncover insights on birth/death rates, fertility, and urbanization. Explored regional disparities and visualized long-term growth patterns.
Tools Used: Python, Pandas, Matplotlib, Seaborn
Highlights:
Cleaned and analyzed historical population data from multiple Indian states
Visualized trends in life expectancy, infant mortality, and fertility rates
Confirmed COVID-19 cases among The city of Chicago residents who live and work in congregate living facilities within the city of Chicago for the reporting period.
Tools Used: Tableau, Data Visualization, Data Cleaning, Data Analysis
Highlights:
Identified high-risk neighborhoods and demographic disparities in case rates
Visualized the impact of vaccination rollout on case reduction over time