Lung Nodule Detection and Classification on LUNA16 Dataset - [Link]
Engineered a specialized detection pipeline for lung nodules from CT scans using the LUNA16 dataset, integrating SimpleITK for image processing and NumPy for data manipulation to enhance analysis precision.
Optimized data quality through advanced augmentation and normalization to foster model generalization, constructing a balanced dataset for CNN training with nodule visualization for accuracy verification.
Initiated a 3D-to-2D preprocessing methodology to transform volumetric CT data into analyzable slices, facilitating effective CNN model development with TensorFlow and Keras, yielding a 93.67% accuracy rate in testing phases.
Text Generator Using GRU-[GitHub]
Created a cutting-edge text generator using GRU network, incorporating preprocessing techniques like one-hot encoding, tokenization, lemmatization, stopword removal, and vectorization.Â
Engineered a powerful multi-layer RNN architecture to optimize feature engineering and enhance model effectiveness.
Upgraded an RNN-based text generation model, resulting in a 20% increase in accuracy for predicting the next word or character; executed 6 iterative evaluation and tuning cycles to improve coherence and quality of generated text.
E-Commerce Text Classification using NLP-[GitHub]
Implemented E-Commerce Text Classification project with machine learning algorithms (Naive Bayes, SVM, XGBoost, BERT).
Automated categorization of 40,000+ product listings and achieved peak accuracy of 75%, enhancing searchability.
Enhanced model classification accuracy by 15% through a pipeline using pandas and NumPy, implementing a refined TF-IDF vectorization strategy, and iteratively optimizing algorithmic approaches using precision, recall, and F1 scores.
Relational-Database-and-Business-Intelligence-System-for-Pizzeria: GitHub
Designed a normalized relational database schema for Pizzeria, encompassing 9+ tables to streamline complex data relationships, enhancing query performance and achieving a 20% improvement in report generation speed.
Executed advanced SQL queries to extract, analyze, and report on KPIs, and supported the generation of two robust dashboards that tracked over 600+ inventory items.
Engineered two interactive dashboards to analyze and visualize sales transactions, Identified top 10 revenue-generating items through data-driven analysis, influencing restocking strategies and menu adjustments.
Data Science Salary Insights Dashboard-[Tableau]
Employed advanced Excel techniques (Data cleaning, XLOOKUP, and Pivot tables) for accurate and impactful data analysis. Used SQL to address key business queries, generating insightful CSV files for visualization.
Orchestrated a dynamic Tableau dashboard, resulting in a 10% enhancement in understanding average salaries and facilitating detailed analysis of trends by experience level, employment type, and geographical location.
Real-Time Weather Data Processing Pipeline with Apache Airflow-[GitHub]
Executed a high-performance ETL process using Apache Airflow DAG, optimizing task dependencies and ensuring API readiness with robust sensor operators, resulting in a 20% reduction in task execution times.
Leveraged HTTP operators to extract real-time weather data from OpenWeather API, performed diverse transformations including temperature conversion with Python and Pandas, and seamlessly integrated results with AWS for secure storage.
Managed AWS integration, orchestrating Airflow on an optimized EC2 instance, ensuring a 15% increase in operational efficiency. Securely loaded transformed weather data into AWS S3, achieving a scalable storage solution for enhanced data management.
End-to-End AWS Data Pipeline for COVID-19 Insights
Spearheaded the development of a robust awsdata pipeline (ETL), seamlessly integrating S3, Athena, and Redshift. Processed and analyzed COVID-19 data from the aws data lake, ensuring a streamlined flow from extraction with Glue Crawler to staging in S3.
Engineered seamless data transformations, integrating Python with Athena to expedite query performance. Crafted intricate fact and dimensional tables, fostering a dynamic data model for swift responses to intricate business queries.
Implemented and fine-tuned aws Redshift, achieving enhancement in query response times. Orchestrated scalable storage for dimensional models. These optimizations ensured optimal performance, crucial for informed data-driven decision-making.