Research       

A brief summary of my current research endeavors

Talks/Seminars about my work

Sajad Mousavi, Ricardo Luna Gutiérrez, Desik Rengarajan, Vineet Gundecha, Ashwin Ramesh Babu, Avisek Naug, Antonio Guillen, Soumyendu Sarkar. Enhancing Large Language Models with Ensemble of Critics for Mitigating Toxicity and Hallucination. In Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models, NeurIPS 2023

[Paper]

We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model’s own feedback. Drawing inspiration from human behavior, we explore whether LLMs can emulate the self-correction process observed in humans who often engage in self-reflection and seek input from others to refine their understanding of complex topics. Our approach is model-agnostic and can be applied across various domains to enhance trustworthiness by addressing fairness, bias, and robustness concerns. We consistently observe performance improvements in LLMs for reducing toxicity and correcting factual errors.

Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, and Srinivas Shakkottai. Federated Ensemble-Directed Offline Reinforcement Learning. In Workshop on Federated Learning and Analytics in Practice, International Conference on Machine Learning, 2023

[Paper][Code]

We consider the problem of federated offline reinforcement learning (RL), a scenario under which distributed learning agents must collaboratively learn a high-quality control policy only using small pre-collected datasets generated according to different unknown behavior policies. Naively combining a standard offline RL approach with a standard federated learning approach to solve this problem can lead to poorly performing policies. In response, we develop the Federated Ensemble-Directed Offline Reinforcement Learning Algorithm (FEDORA), which distills the collective wisdom of the clients using an ensemble learning approach. We develop the FEDORA codebase to utilize distributed compute resources on a federated learning platform. We show that FEDORA significantly outperforms other approaches, including offline RL over the combined data pool, in various complex continuous control environments and real world datasets. Finally, we demonstrate the performance of FEDORA in the real-world on a mobile robot.

Desik Rengarajan*, Sapana Chaudhary*, Jaewon Kim, Dileep Kalathil, and Srinivas Shakkottai. Enhanced meta reinforcement learning using demonstrations in sparse reward environments. In Advances in Neural Information Processing Systems, 2022

Presented at The 36th Conference on Neural Information Processing Systems, NeurIPS 2022 [Paper][Code]

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task.  However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information even if sub-optimal to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.

Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil, and Srinivas Shakkottai. Reinforcement learning with sparse rewards using guidance from offline demonstration. In International Conference on Learning Representations, 2022

Presented as a spotlight paper at The Tenth International Conference on Learning Representations, ICLR 2022 (top 5.1%) [Paper][Code]

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance. 

Kiyeob Lee*, Desik Rengarajan*, Dileep Kalathil, and Srinivas Shakkottai. Reinforcement learning for mean field games with strategic complementarities. In International Conference on Artificial Intelligence and Statistics, 2021

Presented at The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021  [Paper]

In this work, we look at a special case of multi-agent reinforcement learning called mean field reinforcement learning. Mean Field Games (MFG) are those in which each agent optimizes their action with respect to the distribution of states of other players in the system. Systems with a large number of agents can be modeled using this approach. The equilibrium concept here is a Mean Field Equilibrium (MFE), and algorithms for learning MFE in dynamic MFGs are unknown in general due to the non-stationary evolution of the state distribution. We focus on an important subclass that possess a monotonicity property called Strategic Complementarities (MFG-SC).  We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE),  which allows agents to employ a measure of randomization while accounting for the impact of such randomization on their payoffs.   We propose a simple algorithm for computing T-MFE under a known model.  We introduce both a model-free and a model based approach to learning T-MFE under unknown transition probabilities, using the trembling-hand idea of enabling exploration.  We analyze the sample complexity of both algorithms.  We also develop a scheme on concurrently sampling the system with a large number of agents that negates the need for a simulator, even though the model is non-stationary.  Finally, we empirically evaluate the performance of the proposed algorithms via examples motivated by real-world applications.

Archana Bura, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai, and Jean-Francois Chamberland. Learning to cache and caching to learn: Regret analysis of caching algorithms. In IEEE/ACM Transactions on Networking, 2021

Published in  IEEE/ACM Transactions on Networking [Paper][Arxiv] 

Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity  distribution  of  requests.  However,  a  majority  of  work on  analytical  performance  analysis  focuses  on  hit probability after an asymptotically  large  time  has  elapsed.  We  consider an  online  learning  viewpoint,  and  characterize  the  regret  in terms  of  the  finite  time  difference  between  the  hits  achieved by  a  candidate  caching  algorithm  with  respect  to  a  genie-aided scheme  that  places  the  most  popular  items  in  the  cache.  We first  consider  the  Full  Observation  regime  wherein  all  requests are seen by the cache. We show that the Least Frequently Used (LFU)  algorithm  is  able  to  achieve  order  optimal  regret,  which is  matched  by  an  efficient  counting  algorithm  design  that  we call LFU-Lite. We then consider the Partial Observation regime wherein only requests for items currently cached are seen by the cache,  making  it  similar  to  an  online  learning  problem  related to  the  multi-armed  bandit  problem.  We  show  how  approaching this  caching  bandit using  traditional  approaches  yields  either high  complexity  or  regret,  but  a  simple  algorithm  design  that exploits the structure of the distribution can ensure order optimal regret. We conclude by illustrating our insights using numerical simulations. 

Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Bainan Xia, Srinivas Shakkottai, Dileep Kalathil, Ricky KP Mok, and Amogh Dhamdhere. Qflow: A learning approach to high qoe video streaming at the wireless edge. In IEEE/ACM Transactions on Networking, 2021

Published in  IEEE/ACM Transactions on Networking [Paper][Arxiv]

Conference version presented at The twentieth ACM international symposium on mobile ad hoc networking and computing, MobiHoc 2019

The  predominant  use  of  wireless  access  networks is  for  media  streaming  applications.  However,  current  access networks  treat  all  packets  identically,  and  lack  the  agility  to determine  which  clients  are  most  in  need  of  service  at  a  given time.  Software  reconfigurability  of  networking  devices  has  seen wide   adoption,   and   this   in   turn   implies   that   agile   control policies  can  be  now  instantiated  on  access  networks.  Exploiting such  reconfigurability  requires  the  design  of  a  system  that  can enable  a  configuration,  measure  the  impact  on  the  application performance (Quality of Experience), and adaptively select a new configuration. Effectively, this feedback loop is a Markov Decision Process  whose  parameters  are  unknown.  The  goal  of  this  work is  to  develop  QFlow,  a  platform  that  instantiates  this  feedback loop, and instantiate a variety of control policies over it. We use the popular application of video streaming over YouTube as our use case. Our context is priority queueing, with the action space being that of determining which clients should be assigned to each queue  at  each  decision  period.  We  first  develop  policies  based on model-based and model-free reinforcement learning. We then design an auction-based system under which clients place bids for priority service, as well as a more structured index-based policy. Through experiments, we show how these learning-based policies on  QFlow  are  able  to  select  the  right  clients  for  prioritization in  a  high-load  scenario  to  outperform  the  best  known  solutions with  over  25%  improvement  in  QoE,  and  a  perfect  QoE  score of  5  over  85%  of  the  time. 

*Denotes equal contribution

Works from my past life

Chirag Ramesh Srivatsa, Desik Rengarajan, and Vamsi Krishna Tumuluru. Modeling demand flexibility of electric vehicles. In 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm), 2017

Nagamani, A. N., Desik Rengarajan, and Vinod K. Agrawal. An optimized design of reversible magnitude and signed comparators. In 2016 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC),  2016 

Shakthi D. Prasad, Subba B. Reddy, Alok R. Verma, Desik Rengarajan, and Veeresh N. Patil. Influence of Corona Intensity on the Hydropobhicity Recovery of Polymeric Insulating Samples. In International Conference on Condition Assessment Techniques in Electrical Systems (CATCON), 2015