Credit Cards Delinquency Prediction using Classification Algorithms from Scratch
• Processed a credit card delinquency dataset with 30,000+ rows and 20 attributes using Python, employing data pre-processing, cleaning, transformation, and exploratory data analysis, and mitigated class imbalance through resampling techniques like SMOTE.
• Developed machine learning algorithms such as Logistic Regression, Hard-margin SVM, Gaussian Naive Bayes, and Multi-Layer Perceptron Classifier from scratch without using libraries, achieving a robust 77% performance with the MLP Classifier across diverse evaluation metrics.
• Rigorously validated the artificial neural network model on a test dataset to gauge its generalization capabilities to new, unseen data.
Clustering Analysis using Python
• Conducted data cleansing, standardization, and feature transformation on both 3D and 2D financial datasets comprising over 7500 records, to prepare the data for subsequent clustering analysis.
• Utilized K-Means, DBSCAN, and Hierarchical Clustering techniques, visualizing results through Scatter plots and Dendrograms for comprehensive representation.
• Evaluated algorithm performance, achieving an F1 score of 0.84 and a silhouette ratio of 0.76, resulting in meaningful clusters and valuable insights from financial data analysis.
Exploring Linguistic Patterns in Elon Musk's Tweets through NLP Text Analysis
• Employed various NLP techniques using R, including word frequency analysis of over 1500 occurrences for key terms, bigram network graphs, sentiment analysis, and examination of Zipf's law.
• Extracted actionable insights to facilitate trend analysis, revealing mentions of AI increased by 25% over the past year in Elon Musk’s tweets.
Stroke Prediction using Supervised Learning
• Worked on a dataset containing 12,000+ rows and 20 attributes for stroke prediction and conducted data pre-processing, cleaning, transformation, and exploratory data analysis using Python.
• Addressed class imbalance through resampling techniques like SMOTE and implemented 8 machine learning algorithms like random forest, logistic regression, support vector machine, Naive Bayes, decision tree, k-nearest neighbors, linear discriminant analysis and multi-layer perceptron classifier.
• Achieved approximately 93% performance with the random forest classifier across various evaluation metrics, including accuracy, precision, recall, specificity, F1-score, ROC-AUC, and gains chart.
• Successfully tested the random forest classifier against the validation dataset to assess how well the model generalizes to new, unseen data.
Starbucks Data Analysis using R, Flourish, and Datawrapper
• Analyzed three distinct Starbucks datasets on nutrition levels, outlet locations, and customer information, encompassing 25,000+ records and 40+ attributes.
• Employed R libraries including dplyr, tidyverse, and lubridate for data pre-processing, and utilized Ggplot2, Plotly, and Shiny to craft 10 insightful data visualizations and developed 8 interactive charts using web-based tools like Flourish and Datawrapper.
• Integrated all 18 visualizations into a Google Sites website, effectively communicating key insights derived from the data.
Data Visualizations using Tableau
• Examined UK bank customer datasets with 10,000+ records, employing Tableau to visualize insights on age, job roles, gender, regions, and bank balances.
• Investigated noise-related issues in New York City through the development of 3 Tableau dashboards, identifying sources, reasons, time patterns, reporting modes, and proposing resolutions.
• Conducted Sales dataset analysis and constructed Tableau dashboards showcasing revenue and quantity by markets, revenue by year, top five customers, and top five products, resulting in compelling visual presentations for informed decision-making.
Gym Management System using SQL, NoSQL, and Python
• Designed Schema diagrams for the Gym Management System, converting them into relational models, and implemented both SQL and NoSQL (MongoDB) databases, defining 12 tables, establishing key relationships, constraints, and inserting 300+ records across all tables.
• Developed advanced SQL queries, including joins, subqueries, and aggregations, alongside CTEs, to extract valuable insights from the data.
• Engineered a Python application that interfaces with databases, retrieves records, and visualizes data to communicate meaningful insights.
Data Science Process Pipeline using Machine Learning Algorithms
• Description: This project deals with predicting whether an employee would continue to work in an organization or leave due to various reasons by using data science methodology. By analyzing the result, try to understand what could be the reason and then take steps to rectify them.
• Algorithms: Random forest classification, Logistic Regression, Support Vector Machine, Multilayer Perceptron Classifier, Data Science process pipeline.
• Software Components: Anaconda Navigator, Python, and IBM Cloud for web scraping.
Intelligent Access Control System
• Description: In some industries, the workers must wear safety helmets and shoes while working. So to check whether workers are taking safety precautions or not we are proposing this system. USB Camera scans the worker before entry and gives access to enter only if they are wearing helmets and shoes. Otherwise, a warning sound is sent out via speakers.
• Software components: Python, Clarifai visual recognition engine
• Libraries: Open CV, PYTTSX, Clarifai, RPi
• Hardware components: Raspberry Pi3 Model B, USB Camera, SD card, Speakers