Abhranil Chandra
Email: abhranil dot chandra at gmail.com & abhranil dot chandra at uwaterloo dot ca
Google Scholar | Twitter | Github | Linkedin | Blog
Email: abhranil dot chandra at gmail.com & abhranil dot chandra at uwaterloo dot ca
Google Scholar | Twitter | Github | Linkedin | Blog
I am Abhranil Chandra, an AI researcher and a Research Masters (Thesis) student in Computer Science at the Cheriton School of Computer Science, University of Waterloo. I am advised by Prof. Sebastian Fischmeister and work closely with Dr. Sherry Yang from DeepMind, Dr. Rishabh Agarwal from MILA, and with Harshit Sikchi, Dr Amy Zhang and Dr. Scott Niekum at UT Austin on my current research.
My research goal is to study the science of intelligence by building Foundation Model Agents for Sequential Decision-Making in both physical and virtual worlds. These are general-purpose interactive autonomous agents that can learn, adapt, explore, reason, plan, act, and self-improve in open-ended settings.
I explore research via the lens of combining Deep Reinforcement Learning (RL) and Interactive Learning with Generative Foundation Models (FM) to pursue the dream of building scalable and general-purpose Foundation Models Agents for Decision-Making. My specific research interests are:
Building Scalable Decision-Making Algorithms (Offline Data-Driven RL, Unsupervised RL, Goal Conditioned RL, Conditional Generative Models for Decision Making, FM-guided RL) that can both leverage broad knowledge and excel at goal-directed agentic behaviours and self-improvement from interactive learning and exploration.
Study and improve the science of Foundation Models, incorporate robust World Models (WM) in FMs (to enable building Large Multimodal Agents and Large Action/Policy Models) leveraging internet-scale multimodal data (video, language, and structured data) as well as sensorimotor and feedback data from interactions with diverse environments.
Augmenting task-agnostic FMs capable of density estimation with dynamic System-2 abilities like multi-turn decision-making, reasoning, instruction following, tool-usage, and alignment using better pre-training and post-training objectives (using tools from RL - fast online adaptation and RLHF/RLAIF, better fine-tuning, better data - scaling synthetic and decision-centric data).
Study the fundamental principles and ideas behind reasoning and how to move beyond the autoregressive objective to use better alternatives (energy-based methods with adaptable latent space) to build robust reasoning systems (beyond superficial pattern matching and retrieval or memorization) that can enable truly novel exploration and knowledge discovery.
Applications: I am interested in applications empowered by combining internet-scale knowledge of FMs with decision-making algorithms to build Agents for both embodied physical and virtual worlds. Examples include but are not limited to embodied learning using a learned generative world model for efficient decision-making and hierarchical multi-agent reasoning & scalable oversight.
Previously I have worked under Prof. Wenhu Chen at the University of Waterloo on consistent long video generation, better reasoning benchmarks - MMLU Pro, finegrained automatic video generation evaluation - VideoScore. I had the privilege to receive the Microsoft Accelerate Foundation Models Grant in 2023 and led our work on ReFeR under the guidance of Prof. Pawan Goyal from IIT Kharagpur and Dr. Manish Gupta from Microsoft Research India.
I completed my undergraduate from IIT Kharagpur advised by Prof. Pabitra Mitra and had interned under Prof. Jordan Boyd-Graber at the University of Maryland, and Prof. Boyu Wang at the Vector Institute.
News
[Oct'24] Started RA at MILA working with Dr. Rishabh Agarwal from Google DeepMind and Prof. Aaron Courville at MILA on Scaling System-2 Reasoning using RL and Finetuning as part of my thesis research.
[Aug'24] Started research collaboration with senior PhD mentor Harshit Sikchi guided by Prof. Amy Zhang and Prof. Scott Niekum at UT Austin on improving multi-turn online RL for unsupervised preference learning in LLMs.
[Jan'24] Started research collaboration with Dr. Sherry Yang and Dr. Bo Dai at Google DeepMind on Interactive Video Generation Models as World Models as part of my thesis research.
[Dec '23] Presented our paper DiffClone at the TOTO Workshop, NeurIPS 2023 and won the best paper and workshop competition award.
[Sept '23] Our project "A Human-Aligned Automated Scalable Oversight and Evaluation Framework for Generative Outputs via Large Language Models" got accepted for the Accelerating Foundation Models Research Grant by Microsoft Research.
[Sept-Nov '23] Worked as a part-time Student Researcher at Palitronica Inc.
[Sept'23] Received Mitacs Globalink Graduate Research Fellowship and International Master's Award of Excellence at UWaterloo to support my graduate research besides a full academic ride at UWaterloo.
[Sept'23] Joined the David R. Cheriton School of Computer Science at the University of Waterloo with full scholarship and funding as a research master's student and started as an RA and TA.
[May-Aug'23] Started an Applied Research Internship at ThoughtLabs Pvt. Ltd. working on RAG-enhanced LLMs.
[May '23] Graduated from IIT Kharagpur with a B.Tech. degree majoring in Mechanical Engineering, with a minor in Maths and Computing and a micro-specialization in AI.
[May '23] Submitted and defended undergraduate honours thesis advised by Prof. Pabitra Mitra on Approximate inference for efficient Bayesian Deep Learning to improve uncertainty calibration in high-stake low-data computer vision tasks
[Mar '23] Won Gold at Inter IIT Tech Meet 11. I led IIT Kharagpur's team in the NLP event.
Teaching
CS 485/685: Advanced ML: Statistical Learning Theory (sole TA and IA, making and grading assignments, conducting office hours)
CS 467/676: Numeric Computation for Financial Modeling
CS 234: Data Types and Structures
CS 135: Designing Functional Programs
Services
Reviewer for COLM-2025, NeurIPS-2024, Training Agents with Foundation Models@ RLC-2024, Generative Models for Decision Making@ICLR-2024, EMNLP-2023, SDU@AAAI-2022.
[Oct 2020 - May 2023] Advisor and Senior Member, Kharagpur Data Analytics Group: IIT Kharagpur's official society on Machine Learning - we spread awareness about ML research, discuss research papers, conduct reading sessions and workshops on ML-related topics, conduct and participate in competitions, and do independent research work.
[Jul 2019 - Apr 2023] National Service Scheme(NSS) Volunteer: Taught underprivileged kids in nearby villages of IIT Kharagpur the basics of English, Maths and Computing.