About

Publications

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies (NeurIPS 2025)

We develop policy gradient algorithms with global optimality and convergence guarantees for reinforcement learning (RL) with proportional-integral-derivative (PID) parameterized control policies. RL enables learning control policies through direct interaction with a system, without explicit model knowledge that is typically assumed in classical control. The PID policy architecture offers built-in structural advantages, such as superior tracking performance, elimination of steady-state er rors, and robustness to model error that have made it a widely adopted paradigm in practice. Despite these advantages, the PID parameterization has received limited attention in the RL literature, and PID control designs continue to rely on heuristic tuning rules without theoretical guarantees. We address this gap by rigorously integrating PID control with RL, offering theoretical guarantees while maintain ing the practical advantages that have made PID control ubiquitous in practice. Specifically, we first formulate PID control design as an optimization problem with a control policy that is parameterized by proportional, integral, and derivative components. We derive exact expressions for policy gradients in these parameters, and leverage them to develop both model-based and model-free policy gradient algorithms for PID policies. We then establish gradient dominance properties of the PID policy optimization problem, and provide theoretical guarantees on conver gence and global optimality in this setting. Finally, we benchmark the performance of our algorithms on the controlgym suite of environments. Given its impact, this work has also been filed as a patent.

Signal Attenuation Enables Scalable Decentralized Multi-agent Reinforcement Learning over Networks

Multi-agent reinforcement learning (MARL) methods typically require that agents enjoy global state observability, preventing development of decentralized algorithms and limiting scalability. Recent work has shown that, under assumptions on decaying inter-agent influence, global observability can be replaced by local neighborhood observability at each agent, enabling decentralization and scalability. Real-world applications enjoying such decay properties remain underexplored, however, despite the fact that signal power decay, or signal attenuation, due to path loss is an intrinsic feature of many problems in wireless communications and radar networks. In this paper, we show that signal attenuation enables decentralization in MARL by considering the illustrative special case of performing power allocation for target detection in a radar network. To achieve this, we propose two new constrained multi-agent Markov decision process formulations of this power allocation problem, derive local neighborhood approximations for global value function and policy gradient estimates and establish corresponding error bounds, and develop decentralized saddle point policy gradient algorithms for solving the proposed problems. Our approach, though oriented towards the specific radar network problem we consider, provides a useful model for extensions to additional problems in wireless communications and radar networks.

Safe Human-Robot Collaboration with Risk Tunable Control Barrier Functions (TMECH)

We consider the problem of guaranteeing safety constraint satisfaction in human-robot collaboration (HRC) with uncertain human position. We pose this problem as a chance-constrained problem (CCP) with safety (chance) constraints represented by uncertain control barrier functions, where the probability of safety constraint satisfaction under uncertainty is bounded by a tunable user-defined risk. We solve this stochastic optimization problem using a sampling-based approach and obtain a risk-tunable controller to safely accomplish HRC tasks. We demonstrate the safety and performance of this approach through both simulation and hardware experiments on a 7 degree-of-freedom Franka Panda manipulator and characterize the tradeoff between the user-defined risk tolerance and task time efficiency in safety-critical applications. Click here for the video description. This work is also to be presented at AIM-2025, to be held in Hangzhou, China.

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems (AISTATS-2024)

In this paper we develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems. Recent advances at the inter-section of control theory and RL follow a two-stage, safety filter approach to guaranteeing satisfaction of hard safety constraints in such settings: a model-free RL algorithm is used to learn a (potentially unsafe) controller, then the actions proposed by the controller are projected onto the safe sets prescribed by a control barrier function (CBF). Though effective at maintaining safety, such approaches lose the convergence guarantees enjoyed by the underlying RL methods employed. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that allows us to learn RL controllers enjoying classical convergence guarantees while maintaining hard safety constraints during training. In addition, we provide experimental validation illustrating the efficacy of our approach on a simulated quadcopter obstacle avoidance problem as well as an inverted pendulum environment. The corresponding code can be found here.

Safe Control Design through Risk-Tunable Control Barrier Functions (CDC-2023)

We consider the problem of designing controllers to guarantee safety in a class of nonlinear systems under uncertainties in the system dynamics and/or the environment. We define a class of uncertain control barrier functions (CBFs), and formulate the safe control design problem as a chance-constrained optimization problem with uncertain CBF constraints. We leverage the scenario approach for chance constrained optimization to develop a risk-tunable control design that provably guarantees the satisfaction of CBF safety constraints up to a user-defined probabilistic risk bound, and provides a trade-off between the sample complexity and risk tolerance. We demonstrate the performance of this approach through simulations on a quadcopter navigation problem with obstacle avoidance constraints.

Observability of Discrete-Time LTI Systems Under Unknown Piece-Wise Constant Input (LCSS)

This paper is on observability of discrete-time LTI systems under unknown piece-wise constant inputs with sufficiently slow, but arbitrary update times. Assuming knowledge of the update times, we characterize the unobservable subspace and show that with sufficiently many measurements in each inter-update interval of the input, the unobservable subspace remains fixed. We explore the implications of the result for privacy in event-triggered control through an illustrative example.

Experience

Graduate Research Assistant at School of Industrial Engineering, Purdue University (Aug 2023 - Present)
Summer Research Intern at United States Army Research Laboratory (May 2023-Aug 2023) : Safe & Multi-agent Reinforcement learning
Graduate Teaching Assistant for 'IE-474 : Industrial Control Systems', at School of Industrial Engineering, Purdue University
Graduate Teaching Assistant for 'IE-690 : Reinforcement Learning and Control', at School of Industrial Engineering, Purdue University
Software Engineer at Siemens India (Aug. 2020 - July 2022) : Learning and Optimization algorithms in Electricity Markets
Research Associate at Indian Institute of Science (Sep. 2019 - Aug. 2020) at Robert Bosch Centre for Cyber Physical Systems, IISc : Network Control Systems and Multi-agent dynamics on Robots

Research Interests

Learning based Control
Reinforcement Learning
Networked (Multi-Agent) Systems
Robotics
Applied Psychology (Human Behavior, Social Dynamics)

Online Courses

Introduction to Mathematical Thinking, Stanford
Introduction to Psychology, Yale
Fundamentals of Reinforcement Learning, Uni. of Alberta
Neural Networks and Deep Learning, DeepLearning.Al
Applied Machine Learning in Python, Uni. of Michigan

Page updated

Google Sites

Report abuse

About

Publications

Safe Control Design through Risk-Tunable Control Barrier Functions (CDC-2023)

Observability of Discrete-Time LTI Systems Under Unknown Piece-Wise Constant Input (LCSS)

Experience

Research Interests

Online Courses