LLM Inference Optimization – Performance Benchmarking & Quantization Study – (Link)
Tools: Python, PyTorch, Hugging Face Transformers, FlashAttention-2, bitsandbytes, CUDA, NVIDIA A100
Conducted a comprehensive empirical study optimizing inference for GPT-2 (124M) and Mistral-7B (7B) models on NVIDIA A100 GPUs.
Implemented and evaluated hardware-aware optimization strategies:
1. Quantization: Achieved 72% memory reduction (14.5GB → 4.0GB) on Mistral-7B using 4-bit NormalFloat (NF4).
2. Kernel Fusion: Integrated FlashAttention-2 to analyze memory bandwidth bottlenecks and SRAM utilization .
Established rigorous benchmarking protocols using CUDA synchronization, warm-up phases, and 8-iteration measurement cycles for statistical validity
Conducted root cause analysis on performance regressions, identifying kernel launch overheads in low-batch scenarios.
Delivered production deployment recommendations based on workload characteristics like batch size and sequence length .
RAGenetics – Privacy-Preserving RAG for Genetic Test Reports (Link)
Tools: Python, Differential Privacy, RAG, LLMs, VectorDB
Built a privacy-preserving Retrieval-Augmented Generation (RAG) system for genetic test report queries.
Implemented two Differential-Privacy algorithms:
1. DPVoteRAG: sample-and-aggregate with Report-Noisy-Max token release.
2. DPSparseVoteRAG: Sparse Vector Technique-gated RAG to reduce privacy cost when voters align.
Supports synthetic data generation for safely testing genomic-text pipelines.
Designed modular configs, vectorstore builder, voting inference & privacy budgeting.
Ensures DP-secure token generation — research-grade evaluation only (non-clinical use).
PneumoGAN – Pneumonia Detection using GAN Variant (Link)
Tools: Python, GAN, Deep Learning, Medical Imaging
Generated synthetic X-rays using a custom GAN architecture.
Achieved 94.99% accuracy in pneumonia detection.
Used 5,000+ images with augmentation for robust training.
Evaluated using precision, recall, and F1-score for medical reliability.
CashShield – Fake Currency Detection using ML (Link)
Tools: MATLAB, Image Processing, ML Classification
Built a system to detect counterfeit Indian currency using ML-based feature extraction.
Achieved 95%+ accuracy in detecting fake vs original notes.
Classified currency denominations with 90%+ efficiency using ROI intensity analysis.
Telecom Complaints Monitoring System
Tools: ARIMA, LSTM, XGBoost, ETS, Python
Forecasted complaint trends for telecom sectors using time-series ML.
ARIMA outperformed other models during comparative testing.
Housing Price Prediction (Link)
Tools: Python, Pandas, NumPy, scikit-learn, Seaborn, Matplotlib
Built a machine learning model to predict real-estate prices using structured housing data.
Performed data preprocessing, handling missing values, outliers and categorical encoding.
Conducted Exploratory Data Analysis (EDA) to study feature relationships & correlations.
Implemented regression-based models for price prediction with performance evaluation using RMSE & R².
Created visual trends & distribution plots for actionable insight into pricing behavior.
Cricket Analytics – Batting Performance Analysis & Prediction (Link)
Tools: R, ggplot2, dplyr, Statistical Modelling, EDA
Analyzed international batsmen performance using historical ICC cricket datasets.
Performed data cleaning, preprocessing & feature engineering on batting metrics.
Conducted Exploratory Data Analysis (EDA) on averages, strike rate, 100s, 50s & dismissals.
Built Multiple Linear Regression models to predict batting average trends.
Visualized global player distribution, run patterns & year-wise performance trends.
Retail Sales Analytics (Link)
Tools: PostgreSQL, SQL, EDA, Data Visualization
Built & cleaned a retail sales DB with 2000+ records.
Queried top-spending customers, peak sales months & buying patterns.
Delivered actionable insights to improve store revenue strategy.
Sorting Memory Database (Link)
Tools: Java, Julia, Linked List DB, Performance Benchmarking
Built an in-memory toy database using a linked-list storage structure for student records.
Implemented two sorting algorithms — Bubble Sort & Insertion Sort — for performance comparison.
Designed CLI workflow to load, sort, export, and continue operations recursively.
Compared CPU time, memory usage, and disk I/O using Ubuntu profiling tools (/usr/bin/time, iostat).
Results showed Insertion Sort outperforming Bubble Sort consistently in memory-bound execution.
Power-Efficient 4–2 Compressor for Tree Multiplier (Link)
Tools: Cadence Virtuoso, Low-Power VLSI Design
Designed a 4-2 compressor optimized for lower power consumption.
Achieved 35.18% lower power vs existing architectures.
Implemented using custom XOR gates with AVLS, LECTOR & Adiabatic techniques.