Somewhere between images, words and sounds, I reside.

Oh my, which way to go, why can't I decide?

Rishika Bhagwatkar

I am a research master's student at Mila and UdeM, Montreal, under the supervision of Prof. Irina Rish. My main research interests are robustness in multimodal models and scaling. Currently, I am interning at ServiceNow Research with Dr. Krishnamurthy Dvijotham on robustness in agentic systems.

I am also collaborating with Dr. Francesco Croce and Prof. Nicolas Flammarion at the Theory of ML lab, EPFL, on the robustness of discrete representations in vision-language models. Prior to this, I was interning at EPFL's NLP lab (Summer@EPFL) with Prof. Antoine Bosselut and Dr. Syrielle Montariol.

Before my master's, I was a research intern at ALMAnaCH, Inria, Paris, under the supervision of Dr. Djamé Seddah. My work was focused on studying the interactions of various modalities in real-time game sessions.

During my bachelor's, I worked on the conjunction of self-supervised and continual learning with Prof. Christopher Kanan at the Rochester Institute of Technology, New York. As a DAAD WISE Scholar, I worked on appraisal-based emotion recognition from social media data under Dr. Roman Klinger and Dr. Carina Silberer at the University of Stuttgart.

I completed my bachelor's thesis on contrastive learning and domain adaptation from the Department of Electronics and Communication Engineering at the Visvesvaraya National Institute of Technology, India. There, I also mentored projects on understanding and improving language (and multimodal) representations at IvLabs and served as the vice-chairperson of the IEEE Student Branch.

Besides research, I like to spend my time quilling, reading, and visiting new places.

News

[May 2024] Started my internship at ServiceNow Research with Dr. Krishnamurthy Dvijotham on system-level defences for enterprise AI agents.
[May 2024] Awarded the Mitacs Accelerate Fellowship worth 32,000 CAD for my project with ServiceNow Research.
[Dec 2024] Awarded the Women in ML (WiML) Travel Grant to present my work on Improving VLM Robustness at their workshop @ NeurIPS 2024.
[Nov 2024] Presented my work on Improving VLM Robustness virtually at EMNLP 2024.
[Jul 2024] Outstanding paper award for our paper on Improving VLM Robustness at the TiFA Workshop @ ICML 2024, and presented the poster at the NextGenAISafety Workshop @ ICML 2024.
[Jun 2024] Started my Summer@EPFL internship in EPFL's NLP lab.
[Mar 2024] Awarded the Women in AI Scholarship worth 10,000 CAD by Mila - Quebec AI Institute.
[Dec 2023] Received the Université de Montréal International Student Scholarship worth 6,000 CAD.
[Oct 2023] Our paper on Foundation Models for Time-series Prediction was presented at the NeurIPS R0-FoMo Workshop 2023.
[Sep 2023] Received Université de Montréal Exemption Scholarship worth 20,000 CAD.
[Sep 2023] Started my research master's at Mila with Prof. Irina Rish.

All news

Publications

[Paper]

A Guide to Robust Generalization: The Impact of Architecture, Pre-training and Optimization Strategy

Under review

M. Heuillet, R. Bhagwatkar, J. Ngnawe, Y. Pequignot, A. Larouche, C. Gagné, I. Rish, O. Ahmad, A. Durand

We present a diverse benchmark for robust fine-tuning, investigating the effect of several design and training configurations. Our analysis both confirms and challenges prior assumptions, highlighting promising directions for further research.

[Paper] [Poster]

CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments

Under review

R. Bhagwatkar, S. Montariol, A. Romanou, B. Borges, I. RIsh, A. Bosselut

We introduce a first-of-its-kind benchmark for real-world commonsense anomalies supporting three open-ended tasks. Through the high-quality examples and cognitively grounded annotations, we comprehensively evaluate the anomaly detection and understanding capabilities of popular vision-language models.

[Paper] [Poster] [Slides]

Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design

NeurIPS 2024, WiML Workshop 🏆 [Travel Grant] 🏆

EMNLP 2024, Findings

ICML TiFA Workshop, 2024 🏆 [Outstanding Paper Award] 🏆

ICML NextGenAISafety Workshop, 2024

R. Bhagwatkar, S. Nayak, P. Bashivan, I. Rish

We investigate the impact of model design choices on adversarial robustness in VLMs. More interestingly, we propose several inexpensive but highly effective prompt engineering techniques that provide substantial robustness improvements.

[Paper]

Lag-llama: Towards foundation models for probabilistic time series forecasting

NeurIPS R0-FoMo Workshop, 2023

K. Rasul, A. Ashok, A. R. Williams, H. Ghonia, R. Bhagwatkar, et al.

We present Lag-Llama a foundation model for univariate time series forecasting using a transformer architecture with lags as covariates. Pretrained on diverse data, it excels in zero-shot generalization and achieves state-of-the-art performance when fine-tuned, outperforming existing models.

[Paper] [Slides]

Contrastive Learning-Based Domain Adaptation for Semantic Segmentation

National Conference on Communications (NCC), 2022

R. Bhagwatkar, S. Kemekar, V. Domatoti, K. Khan, A. Singh

In this work we hypothesize that real-world images and their corresponding synthetic images are different views of the same abstract representation. To enhance the quality of domain-invariant features, we increase the mutual information between the two inputs.

[Paper] [Slides]

Challenges in scene understanding for autonomous systems

International Conference on Advancements in Interdisciplinary Research (AIR), 2022

R. Bhagwatkar, S. Kemekar, V. Domatoti, K. Khan, A. Singh

In this work, we present various limitations and drawbacks faced by current autonomous pipelines along with solutions to mitigate the same.

[Paper] [Poster] [Video]

Enhancing Context Through Contrast

NeurIPS 2021 Workshop on Pre-registration in Machine Learning

K. Ambilduke, A. Shetye, D. Bagade, R. Bhagwatkar, K. Fitter, P. Vagdargi, S. Chiddarwar

We posit that languages are linguistic transforms that map abstract meaning to sentences. We attempt to extract and investigate this abstract space by optimizing the Barlow Twins objective between latent representations of parallel sentences.

[Paper] [Poster] [Video]

Paying Attention to Video Generation

NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR 148:139-154, 2021

R. Bhagwatkar, K. Fitter, S. Bachu, A. Kulkarni, S. Chiddarwar

Just like sentences are series of words, videos are series of images. Inspired by the success of large language models in predicting language, we attempt to generate videos using a GPT and a novel Attention-based Discretized Autoencoder.

[Paper] [Slides] [Video]

A Review of Video Generation Approaches

International Conference on Power, Instrumentation, Control and Computing (PICC), 2020

R. Bhagwatkar, K. Fitter, S. Bachu, A. Kulkarni, S. Chiddarwar

In this work we study and discuss several approaches for generating videos, either using Generative Adversarial Networks (GANs) to sequential models like LSTMs. Further, we compare the strengths and weakness of each approach with the underlying motivation to provide a broad and rigorous review on the subject.

Projects

[GitHub]

Code coming soon!

[GitHub]

Medical VQA

Deployed various Visual Question Answering models on medical datasets.
Improved Facebook AI Research’s MMF framework for medical data.
Achieved leaderboard performance on the ImageCLEF-2019 dataset.

Video Generation

Aimed at generating entire frames and not pixel-level predictions.
Developed a novel Attention Based Discretized Autoencocder (ADAE).
Coupled the ADAE with a GPT-2 for video generation.

Neural Machine Translation

Created open-source implementations of papers on neural machine translation.
- Sequence to Sequence Learning with Neural Networks [paper]
- Neural Machine Translation by Jointly Learning to Align and Translate [paper]
- Convolutional Sequence to Sequence Learning [paper]
- Attention Is All You Need [paper]

Language Modelling

Generated Dinosaur names using Character-level RNNs.
Developed a paragraph generator to generate text from Harry Potter novels.
Implemented RNNs from scratch and compared performance with and amongst different inbuilt RNN modules using PyTorch.

[GitHub]

Variational Deep Learning

Studied and implemented various autoencoders and generative networks.
Developing variational models for multimodal applications, mainly sequential multimodal data like electroencephalography signals.

Landmark Retrieval

Aimed at extracting images of landmarks similar to a query image.
Designed a ResNet-101 based autoencoder for the above task on “Google’s Landmark Dataset-v2” using TensorFlow.

Real-time Digit Classifier

Developed an open-source pipeline for human-computer interaction using Deep Learning and Computer Vision for digit classification.
Trained Convolutional and Deep Neural Networks from scratch.
Achieved 99% accuracy on the MNIST Dataset in real-time.

Detection & Tracking

Aimed at object detection and tracking from high altitude aerial vehicles.
Optimized the pipeline to deliver real-time performance with human accuracy.

Research Paper Notes

Over the years, through my research and courses, I have collected some research paper notes. These are aimed at being easier to read and understand while also being concise. Suggestions are welcome :)

Prompt Injection Attacks and Defenses for LLM-based Agentic Systems

Related works [Notes]

Vision-Language Modeling:

VQA: Visual Question Answering [Notes]
Visual Dialog: Datasets and Models [Notes]
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Notes]

All notes

Email: rishika (dot) bhagwatkar (at) mila (dot) quebec

Page updated

Google Sites

Report abuse