👨‍🎓 I am a 3rd year MSc student researcher in the CS Department at the UofA, supervised by Prof. Osmar Zaiane and some collaboration with Prof. Matt Taylor. My work specializes in multimodal generative models (LLMs, VLMs, and diffusion models) with a focus on advancing spatial reasoning and planning for grounding language models in real-world applications. Previously, I was a PhD student in the ECE Department at UofA working on robotics, before transferring to CS to pursue my current MSc program.Â
🔬 My core research centers on scaling 3D spatial reasoning in multimodal image generation with test-time compute approaches, curating vision-language QA datasets for embodied agents in human-centered environments, and enhancing the reasoning abilities of multimodal LLMs through in-context learning and fine-tuning (SFT & RL Fine-Tuning) methods.Â
Beyond my main research, I have explored multimodal retrieval methods (visual RAG and geometrical retrieval), visual–tabular representation learning, 3D CAD model generation by generative models, and uncertainty quantification in LLMs through open-domain QA dataset curation, contrastive prompting, and graph-based confidence mapping. I have also explored deep RL for online policy optimization in continuous control, and adaptive state representations based on learned feature combinations.
I’m broadly interested in machine learning and deep learning, with a focus on Multimodal Generative Models, NLP, and Reinforcement Learning.Â
I bring 3+ years of academic and industry experience developing and deploying deep learning systems, along with publications at venues such as AAAI, IJCAI, and IROS.
My latest publications: