Multi-agent reinforcement learning (MARL) methods typically require that agents enjoy global state observability, preventing development of decentralized algorithms and limiting scalability. Recent work has shown that, under assumptions on decaying inter-agent influence, global observability can be replaced by local neighborhood observability at each agent, enabling decentralization and scalability. Real-world applications enjoying such decay properties remain underexplored, however, despite the fact that signal power decay, or signal attenuation, due to path loss is an intrinsic feature of many problems in wireless communications and radar networks. In this paper, we show that signal attenuation enables decentralization in MARL by considering the illustrative special case of performing power allocation for target detection in a radar network. To achieve this, we propose two new constrained multi-agent Markov decision process formulations of this power allocation problem, derive local neighborhood approximations for global value function and policy gradient estimates and establish corresponding error bounds, and develop decentralized saddle point policy gradient algorithms for solving the proposed problems. Our approach, though oriented towards the specific radar network problem we consider, provides a useful model for extensions to additional problems in wireless communications and radar networks.
We consider the problem of guaranteeing safety constraint satisfaction in human-robot collaboration (HRC) with uncertain human position. We pose this problem as a chance-constrained problem (CCP) with safety (chance) constraints represented by uncertain control barrier functions, where the probability of safety constraint satisfaction under uncertainty is bounded by a tunable user-defined risk. We solve this stochastic optimization problem using a sampling-based approach and obtain a risk-tunable controller to safely accomplish HRC tasks. We demonstrate the safety and performance of this approach through both simulation and hardware experiments on a 7 degree-of-freedom Franka Panda manipulator and characterize the tradeoff between the user-defined risk tolerance and task time efficiency in safety-critical applications. Click here for the video description. This work is also to be presented at AIM-2025, to be held in Hangzhou, China.
In this paper we develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems. Recent advances at the inter-section of control theory and RL follow a two-stage, safety filter approach to guaranteeing satisfaction of hard safety constraints in such settings: a model-free RL algorithm is used to learn a (potentially unsafe) controller, then the actions proposed by the controller are projected onto the safe sets prescribed by a control barrier function (CBF). Though effective at maintaining safety, such approaches lose the convergence guarantees enjoyed by the underlying RL methods employed. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that allows us to learn RL controllers enjoying classical convergence guarantees while maintaining hard safety constraints during training. In addition, we provide experimental validation illustrating the efficacy of our approach on a simulated quadcopter obstacle avoidance problem as well as an inverted pendulum environment. The corresponding code can be found here.
We consider the problem of designing controllers to guarantee safety in a class of nonlinear systems under uncertainties in the system dynamics and/or the environment. We define a class of uncertain control barrier functions (CBFs), and formulate the safe control design problem as a chance-constrained optimization problem with uncertain CBF constraints. We leverage the scenario approach for chance constrained optimization to develop a risk-tunable control design that provably guarantees the satisfaction of CBF safety constraints up to a user-defined probabilistic risk bound, and provides a trade-off between the sample complexity and risk tolerance. We demonstrate the performance of this approach through simulations on a quadcopter navigation problem with obstacle avoidance constraints.
This paper is on observability of discrete-time LTI systems under unknown piece-wise constant inputs with sufficiently slow, but arbitrary update times. Assuming knowledge of the update times, we characterize the unobservable subspace and show that with sufficiently many measurements in each inter-update interval of the input, the unobservable subspace remains fixed. We explore the implications of the result for privacy in event-triggered control through an illustrative example.
Graduate Research Assistant at School of Industrial Engineering, Purdue University (Aug 2023 - Present)
Summer Research Intern at United States Army Research Laboratory (May 2023-Aug 2023) : Safe & Multi-agent Reinforcement learning
Graduate Teaching Assistant for 'IE-474 : Industrial Control Systems', at School of Industrial Engineering, Purdue University
Graduate Teaching Assistant for 'IE-690 : Reinforcement Learning and Control', at School of Industrial Engineering, Purdue University
Software Engineer at Siemens India (Aug. 2020 - July 2022) : Learning and Optimization algorithms in Electricity Markets
Research Associate at Indian Institute of Science (Sep. 2019 - Aug. 2020) at Robert Bosch Centre for Cyber Physical Systems, IISc : Network Control Systems and Multi-agent dynamics on Robots
Learning based Control
Reinforcement Learning
Networked (Multi-Agent) Systems
Robotics
Applied Psychology (Human Behavior, Social Dynamics)
Introduction to Mathematical Thinking, Stanford
Introduction to Psychology, Yale
Fundamentals of Reinforcement Learning, Uni. of Alberta
Neural Networks and Deep Learning, DeepLearning.Al
Applied Machine Learning in Python, Uni. of Michigan