Research

Meta-learning vs Pre-training

Meta-learning is the paradigm in machine learning where we train models to learn. That is, instead of just training a model to perform well on a dataset, we train it to learn to perform well on a dataset, possibly with fewer samples. This is particularly relevant for few-shot learning settings. More importantly, this is similar to how we learn things, and is indicative of its potential to achieve general intelligence.

Despite this, there is a rising opinion that pre-training algorithms can beat meta-learning algorithms. Along with Prof Sanmi Koyejo and Brando Miranda at Stanford, we investigated whether common comparisons in literature are fair comparisons, and hypothesized that their relative performance depended on the properties of the dataset. In particular, we experimentally established that meta-learning techniques work best when the dataset diversity is too high to be memorised by a traditional learner. That is, in real-life situations where information is too diverse to memorise easily, meta-learning is expected to perform better. We gave evidence for this using computer vision few-shot learning tasks. We additionally redefined LLM training techniques to fit the scope of meta-learning to further our claim for NLP. We have recently submitted our finding to ICLR 2024 and you can find the pre-print here.

In Fall 2023, we started looking at the problem theoretically, to investigate how diversity affects learnability under different training conditions.

Communication Complexity in Multi-Party Computation (2022)

I worked on Oblivious Transfer (OT) Communication Complexity during multi-party computation as part of my BTech thesis at IIT Bombay. I looked at this problem from the lense of Secure Zero Communication Reductions (SZCR) which was a new cryptographic protocol introduced by Narayan et al. I worked with Prof Manoj Prabhakaran and Varun Narayan, and established efficient reduction of 2-party communication protocols to the SZCR, and used that to prove, for the first time, the existence of functions with communication complexity unbounded in the input size. Specifically, we showed a class of no-input, 0-bit output randomised functions that had unbounded communication complexity. I was awarded the Undergraduate Research Award - 02 by IIT Bombay for my work on the topic, and we presented the paper at the Theory of Cryptography Conference (TCC) 2022.

Document Structure Masking (2021)

The BERT masked language model was introduced in 2018 and was really popular utilising a lot of new and previously used ideas in LLMs. Among them were their use of attention masking mechanisms which allowed the model to learn relevant correlations between various words in the text. I worked with Prof Ganesh Ramakrishnan and Adobe Research on developing newer attention mechanisms. We developed mechanisms to enable model understanding of how documents are structured into sections, which allowed the model to be equally applicable to everything from infographics to research papers. We were able to achieve this through world-level embeddings and document tree masking mechanisms.

JalTantra (2020)

I worked with Professor Om Damani along with the JalTantra team on Cost Optimisation of Water Distribution networks. It is a non-convex optimization problem to develop least cost water flow networks that we tackled using solver-agnostic techniques by developing an Non-Linear Programming formulation. We developed the best-known system and presented our paper at IFORS 2021 and EURO 2021 conferences. I was additionally awarded the Undergraduate Research Award - 01 by IIT Bombay for my role in the project which is now part of a system used by hundreds of engineers and many state governments in India. We are also part of the National Jal Jeevan Mission which aims to provide safe and adequate drinking water through individual household tap connections to all households in rural India.

Exemplar Graph Query Suggestion

Knowledge graphs such as DBPedia are used to organise data and their relationships using graph structures. Exemplar queries on graph databases provide an example of the data which the user is interested in. In this project under the guidance of Prof Davide Mottin from Aarhus University and Dr Matteo Lissandrini from Aalborg University, I investigated suggestion mechanisms for users to construct exemplar queries. This is similar to how search engines provide suggestions for completing textual queries. Owing to this similariy, I studied n-gram language models used for information retrieval (IR) and extended them to the develop bigram edge models on knowledge graphs.