Somewhere between images, words and sounds, I reside.
Oh my, which way to go, why can't I decide?
I am a research master's student at Mila and UdeM, Montreal, under the supervision of Prof. Irina Rish. My main research interests are robustness in multimodal models and scaling. Currently, I am interning at ServiceNow Research with Dr. Krishnamurthy Dvijotham on robustness in agentic systems.
I am also collaborating with Dr. Francesco Croce and Prof. Nicolas Flammarion at the Theory of ML lab, EPFL, on the robustness of discrete representations in vision-language models. Prior to this, I was interning at EPFL's NLP lab (Summer@EPFL) with Prof. Antoine Bosselut and Dr. Syrielle Montariol.
Before my master's, I was a research intern at ALMAnaCH, Inria, Paris, under the supervision of Dr. Djamé Seddah. My work was focused on studying the interactions of various modalities in real-time game sessions.
During my bachelor's, I worked on the conjunction of self-supervised and continual learning with Prof. Christopher Kanan at the Rochester Institute of Technology, New York. As a DAAD WISE Scholar, I worked on appraisal-based emotion recognition from social media data under Dr. Roman Klinger and Dr. Carina Silberer at the University of Stuttgart.
I completed my bachelor's thesis on contrastive learning and domain adaptation from the Department of Electronics and Communication Engineering at the Visvesvaraya National Institute of Technology, India. There, I also mentored projects on understanding and improving language (and multimodal) representations at IvLabs and served as the vice-chairperson of the IEEE Student Branch.
Besides research, I like to spend my time quilling, reading, and visiting new places.
[May 2024] Started my internship at ServiceNow Research with Dr. Krishnamurthy Dvijotham on system-level defences for enterprise AI agents.
[May 2024] Awarded the Mitacs Accelerate Fellowship worth 32,000 CAD for my project with ServiceNow Research.
[Dec 2024] Awarded the Women in ML (WiML) Travel Grant to present my work on Improving VLM Robustness at their workshop @ NeurIPS 2024.
[Nov 2024] Presented my work on Improving VLM Robustness virtually at EMNLP 2024.
[Jul 2024] Outstanding paper award for our paper on Improving VLM Robustness at the TiFA Workshop @ ICML 2024, and presented the poster at the NextGenAISafety Workshop @ ICML 2024.
[Jun 2024] Started my Summer@EPFL internship in EPFL's NLP lab.
[Mar 2024] Awarded the Women in AI Scholarship worth 10,000 CAD by Mila - Quebec AI Institute.
[Dec 2023] Received the Université de Montréal International Student Scholarship worth 6,000 CAD.
[Oct 2023] Our paper on Foundation Models for Time-series Prediction was presented at the NeurIPS R0-FoMo Workshop 2023.
[Sep 2023] Received Université de Montréal Exemption Scholarship worth 20,000 CAD.
[Sep 2023] Started my research master's at Mila with Prof. Irina Rish.
Under review
M. Heuillet, R. Bhagwatkar, J. Ngnawe, Y. Pequignot, A. Larouche, C. Gagné, I. Rish, O. Ahmad, A. Durand
We present a diverse benchmark for robust fine-tuning, investigating the effect of several design and training configurations. Our analysis both confirms and challenges prior assumptions, highlighting promising directions for further research.
Under review
R. Bhagwatkar, S. Montariol, A. Romanou, B. Borges, I. RIsh, A. Bosselut
We introduce a first-of-its-kind benchmark for real-world commonsense anomalies supporting three open-ended tasks. Through the high-quality examples and cognitively grounded annotations, we comprehensively evaluate the anomaly detection and understanding capabilities of popular vision-language models.
NeurIPS 2024, WiML Workshop 🏆 [Travel Grant] 🏆
EMNLP 2024, Findings
ICML TiFA Workshop, 2024 🏆 [Outstanding Paper Award] 🏆
ICML NextGenAISafety Workshop, 2024
R. Bhagwatkar, S. Nayak, P. Bashivan, I. Rish
We investigate the impact of model design choices on adversarial robustness in VLMs. More interestingly, we propose several inexpensive but highly effective prompt engineering techniques that provide substantial robustness improvements.
NeurIPS R0-FoMo Workshop, 2023
K. Rasul, A. Ashok, A. R. Williams, H. Ghonia, R. Bhagwatkar, et al.
We present Lag-Llama a foundation model for univariate time series forecasting using a transformer architecture with lags as covariates. Pretrained on diverse data, it excels in zero-shot generalization and achieves state-of-the-art performance when fine-tuned, outperforming existing models.
National Conference on Communications (NCC), 2022
R. Bhagwatkar, S. Kemekar, V. Domatoti, K. Khan, A. Singh
In this work we hypothesize that real-world images and their corresponding synthetic images are different views of the same abstract representation. To enhance the quality of domain-invariant features, we increase the mutual information between the two inputs.
International Conference on Advancements in Interdisciplinary Research (AIR), 2022
R. Bhagwatkar, S. Kemekar, V. Domatoti, K. Khan, A. Singh
In this work, we present various limitations and drawbacks faced by current autonomous pipelines along with solutions to mitigate the same.
NeurIPS 2021 Workshop on Pre-registration in Machine Learning
K. Ambilduke, A. Shetye, D. Bagade, R. Bhagwatkar, K. Fitter, P. Vagdargi, S. Chiddarwar
We posit that languages are linguistic transforms that map abstract meaning to sentences. We attempt to extract and investigate this abstract space by optimizing the Barlow Twins objective between latent representations of parallel sentences.
NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR 148:139-154, 2021
R. Bhagwatkar, K. Fitter, S. Bachu, A. Kulkarni, S. Chiddarwar
Just like sentences are series of words, videos are series of images. Inspired by the success of large language models in predicting language, we attempt to generate videos using a GPT and a novel Attention-based Discretized Autoencoder.
International Conference on Power, Instrumentation, Control and Computing (PICC), 2020
R. Bhagwatkar, K. Fitter, S. Bachu, A. Kulkarni, S. Chiddarwar
In this work we study and discuss several approaches for generating videos, either using Generative Adversarial Networks (GANs) to sequential models like LSTMs. Further, we compare the strengths and weakness of each approach with the underlying motivation to provide a broad and rigorous review on the subject.
Medical VQA
Deployed various Visual Question Answering models on medical datasets.
Improved Facebook AI Research’s MMF framework for medical data.
Achieved leaderboard performance on the ImageCLEF-2019 dataset.
Video Generation
Aimed at generating entire frames and not pixel-level predictions.
Developed a novel Attention Based Discretized Autoencocder (ADAE).
Coupled the ADAE with a GPT-2 for video generation.
Neural Machine Translation
Language Modelling
Generated Dinosaur names using Character-level RNNs.
Developed a paragraph generator to generate text from Harry Potter novels.
Implemented RNNs from scratch and compared performance with and amongst different inbuilt RNN modules using PyTorch.
Variational Deep Learning
Studied and implemented various autoencoders and generative networks.
Developing variational models for multimodal applications, mainly sequential multimodal data like electroencephalography signals.
Landmark Retrieval
Aimed at extracting images of landmarks similar to a query image.
Designed a ResNet-101 based autoencoder for the above task on “Google’s Landmark Dataset-v2” using TensorFlow.
Real-time Digit Classifier
Developed an open-source pipeline for human-computer interaction using Deep Learning and Computer Vision for digit classification.
Trained Convolutional and Deep Neural Networks from scratch.
Achieved 99% accuracy on the MNIST Dataset in real-time.
Detection & Tracking
Aimed at object detection and tracking from high altitude aerial vehicles.
Optimized the pipeline to deliver real-time performance with human accuracy.
Over the years, through my research and courses, I have collected some research paper notes. These are aimed at being easier to read and understand while also being concise. Suggestions are welcome :)
Prompt Injection Attacks and Defenses for LLM-based Agentic Systems
Related works [Notes]
Vision-Language Modeling: