Kishan Panaganti Badrinath

Job Market

I am on the job market to start in mid-2025, applying for:

tenure-track faculty positions in computer science, electrical and computer engineering, industrial engineering and operations research, and similar engineering and business school departments.
research scientist roles in industrial research groups in the field of general machine learning, focusing on decision-making under uncertainty, game theoretic decision-making, learning under epistemic and aleatoric uncertainties, and similar areas.

Job Market Application Materials

Research Statement (email for full-version)

Four Representative Publications

"Sample Complexity of Robust Reinforcement Learning with a Generative Model" (AISTATS 2022) - Kishan Panaganti, Dileep Kalathil

Presents and analyses the first fundamental comprehensive results for the robust reinforcement learning problem.
Recommended read for a general audience.

Abstract: The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an \epsilon-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.

"Robust Reinforcement Learning using Offline Data" (NeurIPS 2022) - Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

Demonstrates the first tractable robust reinforcement learning algorithm for high-dimensional environments like MuJoCo and presents fundamental theoretical and statistical guarantees.
Recommended read for a statistical reinforcement learning audience and reinforcement learning practitioners. This is the most technical read on this list.
In a follow-up work, we present a unified analysis and generalized results for robust reinforcement learning.

Abstract: The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to learn the policy that maximizes the value against the worst possible models that lie in an uncertainty set. In this work, we propose a robust RL algorithm called Robust Fitted Q-Iteration (RFQI), which uses only an offline dataset to learn the optimal robust policy. Robust RL with offline data is significantly more challenging than its non-robust counterpart because of the minimization over all models present in the robust Bellman operator. This poses challenges in offline data collection, optimization over the models, and unbiased estimation. In this work, we propose a systematic approach to overcome these challenges, resulting in our RFQI algorithm. We prove that RFQI learns a near-optimal robust policy under standard assumptions and demonstrate its superior performance on standard benchmark problems.

"Tractable Equilibrium Computation in Markov Games through Risk Aversion" (ICLR 2025, Oral top 1.8%) - Eric Mazumdar, Kishan Panaganti, Laixi Shi (equal authorship)

By incorporating risk aversion and bounded rationality into agents' decision-making processes, we introduce a computationally tractable equilibria class that aligns with observed human strategical behaviors.
Recommended read for econometrics and learning theory audiences.

Abstract: A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that—by imbuing agents with important features of human decision-making like risk aversion and bounded rationality—a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all n-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents’ degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples’ patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.

"Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage" (L4DC 2025) - Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

Presents a fundamental bridge between robust reinforcement learning and offline reinforcement learning frameworks through the pessimism in the face of uncertainty principle.
Recommended read for a machine learning audience interested in statistical guarantees, offline RL, optimism or pessimism in the face of uncertainty principles. This is the second most technical read on this list.

Abstract: The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works have used the idea of pessimism for developing offline RL algorithms and characterizing their sample complexity under a relatively weak assumption of single policy concentrability. Different from the offline RL literature, the area of distributionally robust learning (DRL) offers a principled framework that uses a minimax formulation to tackle model mismatch between training and testing environments. In this work, we aim to bridge these two areas by showing that the DRL approach can be used to tackle the distributional shift problem in offline RL. In particular, we propose two offline RL algorithms using the DRL framework, for the tabular and linear function approximation settings, and characterize their sample complexity under the single policy concentrability assumption. We also demonstrate the superior performance our proposed algorithm through simulation experiments.

Page updated

Report abuse