Leverage Multimodal Large Language Models (MLLMs) and CLIP-based nearest neighbors for efficient Vocabulary-Free Fine-Grained Visual Recognition (VF-FGVR).
Currently collaborating with Microsoft Research on the verification of reasoning in multimodal large language models.
First author of an ICLR VeriAI & ES-Reasoning Workshop paper (under review) on training-free verification of
Multimodal reasoning using Nash Equilibrium, in collaboration with Microsoft Research.
Conformal Prediction for Multimodal Models — Co-authored a research paper that investigates principled uncertainty estimation for vision-language systems.
Architected and deployed production-grade LLM solutions using Mistral 7B, achieving 93% accuracy and a BLEU score of 34 through rigorous A/B testing and model evaluation.
Enhanced sentiment analysis accuracy by 20% through advanced prompt engineering and model fine-tuning, implementing a CI/CD pipeline for continuous model improvement.
Optimized 2D image and 3D point cloud data quality by 73%, achieving MSE and PSNR improvements of 35% using OpenCV and PyTorch within an Agile development framework.
Implemented and trained CNN architectures (PointNet, PointNet++, RSNet) achieving 93% accuracy and 88% mIOU, utilizing 1.20 million parameters and 0.09 GMAC operations.
Developed and deployed a Python-based image recognition system achieving 95% accuracy on airport infrastructure analysis, improving flora and fauna detection by 40%.
Conducted extensive research using ModelNet10, ModelNet40, and Stanford University's proprietary ShapeNet dataset.