Purbesh Mitra - About Research

My Current Research

The recent advances in generative AI, like large language models (LLMs), have opened up a new paradigm of automated reasoning and instruction following. These models can be used as context-based information retriever, automated text-bots, code-assistants, and most importantly, agents that take useful actions based on provided instructions. With the rapid adoption of agentic AI workflows across white-collar industries, it is crucial that these LLMs are able to reason over long context size for generating thoughtful responses/actions. My work introduces a reinforcement learning based training technique that allows an LLM to think modularly, over multiple iterations and generate highly accurate responses:

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

Furthermore, with the hyper-connectivity provided by 5G/6G technologies, we expect to see a surge in AI agents talking to each other to perform goal-oriented tasks, without any human supervision. Such communication, also known as gossiping, enable semantic-level information exchange rather than the bit-level exchange in classical communication settings. In a dynamic data environment, it is a necessity that these agents do not spread any false or outdated information. My work on semantic communication of LLMs shows that different edge devices can do inference collaboratively in a mixture-of-agents setting to enhance their overall performance by leveraging gossiping, while maintaining stability and limited latency:

Distributed Mixture-of-Agents for Edge Inference with Large Language Models

My Previous Works

Some of my earlier works focus on training machine learning systems over distributed data for preserving user privacy. Here also, faster information exchange over large learning networks is necessary for model convergence. My research addresses this issue in the following papers:

Additionally, high quality inference in an agent-based system, like, the autonomous vehicles network, requires minimizing the staleness of a large network with minimal latency. In the following works, I propose an opportunistic algorithm to keep the average staleness bounded as the network size scales:

An introductory article about timeliness in gossip networks, that I co-authored, can be found here:

Age of Information in Gossip Networks: A Friendly Introduction and Literature Survey

My Google Scholar page contains a list of other papers regarding these topics.