To analyze global agricultural crop yield trends and evaluate the relationship between crop output, rainfall patterns, and pesticide use.
Cleaned and preprocessed large-scale structured agricultural data.
Performed multi-dimensional grouping by area, crop, and year.
Computed and visualized correlation matrices (e.g., pesticide use vs. yield).
Designed interactive dashboards using Plotly for policy-level decision insights.
Python, Pandas, Plotly, Time-Series Grouping, Correlation Analysis
AgriTech, Climate Intelligence, Government Agricultural Policy
Explore the complete project, source code, and documentation on GitHub:
Β π GitHub Repository
Develop a Convolutional Neural Network (CNN) to automate white blood cell classification using medical imaging.
Preprocessed image datasets (resizing, normalization with OpenCV).
Built CNN model using TensorFlow/Keras with layers like Conv2D, MaxPooling.
Trained and validated model performance; saved model for clinical deployment.
TensorFlow, Keras, OpenCV, CNN Architecture, Model Evaluation
Medical Diagnostics, Healthcare AI, Digital Pathology
Explore the complete project, source code, and documentation on GitHub:
π GitHub Repository
π Objective
Leverage sensor data to predict plant health status and enable early intervention through machine learning.
π‘ Key Contributions
Performed exploratory data analysis to uncover patterns in soil moisture and plant stress levels.
Visualized correlations and feature distributions using Seaborn and Matplotlib.
Applied Label Encoding and SMOTE to prepare and balance the dataset.
Trained and evaluated KNN and Random Forest classifiers to detect plant health categories (Healthy, Moderate, Stressed).
π οΈ Tools & Techniques
Python, Pandas, NumPy, Seaborn, Matplotlib
Scikit-learn, imbalanced-learn (SMOTE), Jupyter Notebook
π Industry Relevance
Precision Agriculture, AgriTech, Smart Farming Automation
β Outcomes
Identified Soil_Moisture as a key predictor of plant stress.
Developed scalable predictive models with interpretable results.
Demonstrated a complete end-to-end data science pipeline with real-world agricultural impact.
π§ Project Repository
Explore the complete project, source code, and documentation on GitHub:
π GitHub Repository
π Objective
Develop and evaluate regression models to predict a continuous target variable using a full ML pipeline in Python.
π‘ Key Contributions
Loaded and explored Excel-based dataset with .info(), .describe(), and null value checks.
Cleaned data by removing irrelevant columns.
Engineered and selected features based on correlation heatmap analysis.
Built and trained two regression models:
β
Decision Tree Regressor
β
Polynomial Linear Regression (using pipeline)
Evaluated performance using Mean Absolute Percentage Error (MAPE).
Visualized predictions, feature relationships, and model comparisons.
π οΈ Tools & Techniques
Python, Pandas, NumPy
Scikit-learn (Decision Tree, Polynomial Regression, Pipelines)
Seaborn, Matplotlib
π Industry Relevance
Predictive Analytics, Business Intelligence, Data-Driven Decision Making
β Outcomes
Identified strong predictors through correlation analysis.
Compared model performances using visual and quantitative metrics.
Delivered interpretable insights from regression models.
π§ Project Repository
Explore the complete project, source code, and documentation on GitHub:
π GitHub Repository
π Objective|
Analyze international product sales data segmented by country, product, and customer segment to evaluate profitability, trends, and market performance using interactive Power BI reports.
π‘ Key Contributions
Interactive dashboards by segment, product, and geography
Monthly/yearly breakdowns of units sold, gross sales, discounts
Country-wise profitability comparison
Multi-level slicers and intuitive filters
Drill-down profit and trend insights
π οΈ Tools & Techniques
Power BI, DAX, Power Query, Visual Analytics
π Industry Relevance
Retail Analytics, Market Performance, Sales Strategy
β Outcomes
2014 identified as the strongest year for sales
Key products and countries visualized for growth
Informed strategy via data-driven insights
π View Report
π Financial Data Analysis Report (PDF)
This report contains all dashboards, visual analyses, and insights mentioned above.
π Objective
Analyze departmental and membership-based revenue and profit data across UK cities to identify performance trends and customer spending patterns.
π‘ Key Contributions
Combined multiple datasets (membership, departments, location)
Analyzed Gold/Bronze member behavior and impact
Visualized revenue by product categories like Bikes, Clothing, Footwear
Location-wise sales insights for 30+ cities
Club Membership segmentation and department success indicators
π οΈ Tools & Techniques
Power BI, Data Modeling, DAX, Interactive Slicers
π Industry Relevance
Customer Segmentation, Loyalty Analytics, Retail BI
β Outcomes
Gold members drove majority of high-profit sales
Departments like Outdoors & Cycle led profit margins
Enabled location-specific insights for operational decisions
π View Report
π Customer Spend & Membership Report (PDF)
Explore interactive visuals and detailed business insights in this Power BI report.
π Objective
To create an intelligent PDF assistant powered by Retrieval-Augmented Generation (RAG) that enables users to summarize, explore, and ask context-grounded questions about any uploaded PDF document.
π‘ Key Contributions
Developed a Streamlit-based interactive UI for PDF upload and exploration.
Implemented document chunking with overlap for improved context retrieval.
Integrated HuggingFace sentence-transformers with FAISS for fast, in-memory vector similarity search.
Designed prompt engineering with predefined and custom instruction styles for tailored responses.
Automated concise document summaries and generated recommended exploratory questions.
Ensured context-aware answers by restricting responses to uploaded document content only.
π οΈ Tools & Techniques
Python, Streamlit, LangChain, FAISS, HuggingFace Embeddings, ChatGroq, PyPDFLoader,LLM
π Industry Relevance
Document Intelligence, LegalTech, Research Assistance, Knowledge Management
π§ Project Repository
Explore the complete project, source code, and documentation on GitHub:
π Github RepoΒ
π Objective
To analyze credit card transaction patterns, detect fraudulent activities using ensemble and linear machine learning models, and build predictive systems for fraud prevention while addressing class imbalance in real-world financial data.
π‘ Key Contributions
Conducted comprehensive exploratory data analysis (EDA) on 80,000+ transactions across 16 features to uncover transaction behavior and fraud anomalies
Preprocessed and cleaned structured financial data with categorical encoding and normalization for modeling
Implemented supervised machine learning models (Random Forest with 99% accuracy, Logistic Regression) with balanced class weighting to address 7% fraud rate
Evaluated models using accuracy, precision, recall, ROC-AUC, and confusion matrixβRandom Forest achieved 98% precision, 82% recall, and 0.992 ROC-AUC
Selected Random Forest as production model for superior fraud detection (1,505+ frauds identified out of 1,836) while minimizing false alarms (2% false positive rate)
π οΈ Tools & Techniques
Python, Pandas, NumPy, Matplotlib, Seaborn, scikit-learn, Supervised Learning (Random Forest, Logistic Regression), Balanced Class Weighting, Classification Metrics, ROC-AUC Analysis, Cross-Validation
π Industry Relevance
FinTech & Digital Payments, Banking & Credit Institutions, Risk Management, Fraud Prevention, Cybersecurity.Β
Business Impact: Detected 82% of fraudulent transactions, enabling effective identification of high-risk activities and significantly improving fraud prevention capability in transaction processing systems.
π§ Project Repository
Explore the complete project, source code, and documentation on GitHub:
π Github Repo
Conducted exploratory data analysis (EDA) to uncover health risk patterns across factors such as age, gender, BMI, glucose, smoking, and diabetes.
Preprocessed and cleaned clinical datasets, handling missing values and balancing the data using SMOTE for improved model fairness.
Built and compared machine learning classification models, including Random Forest and K-Nearest Neighbors (KNN).
Evaluated model performance using accuracy, precision, recall, F1-score, and confusion matrix to ensure reliability.
Deployed the final trained model as a REST API using FastAPI, integrating Pydantic validation for robust input handling and enabling real-time predictions.