👨‍🎓 I am a 3rd year MSc student researcher in the CS Department at the UofA, supervised by Prof. Osmar Zaiane and some collaboration with Prof. Matt Taylor. My work specializes in multimodal generative models (LLMs, VLMs, and diffusion models) with a focus on advancing spatial reasoning and planning for grounding language models in real-world applications. Previously, I was a PhD student in the ECE Department at UofA working on robotics, before transferring to CS to pursue my current MSc program. 


🔬 My core research centers on scaling 3D spatial reasoning in multimodal image generation with test-time compute approaches, curating vision-language QA datasets for embodied agents in human-centered environments, and enhancing the reasoning abilities of multimodal LLMs through in-context learning and fine-tuning (e.g., PEFT & RLFT) methods. 


Beyond my main research, I have explored multimodal retrieval methods (visual RAG and geometrical retrieval), visual–tabular representation learning, and uncertainty quantification in LLMs through open-domain QA dataset curation, contrastive prompting, and graph-based confidence mapping. 


Broadly, I am interested in ML and GenAI, including NLP, multimodal generative models, reasoning, and RL, with publications at venues such as AAAI, IJCAI, and IROS. I also bring 3.5+ years of experience as a robotic and computer vision software engineer and over 2 years as an ML model developer, with expertise in PyTorch, Hugging Face, LangChain, and robotics platforms such as ROS. 


My latest publications: