Safe Reinforcement Learning
We develop safe exploratory reinforcement learning that learns on the fly while satisfying safety. Our work uses standard Constrained MDPs: we set clear safety and resource budgets, then train policies that explore efficiently while staying within those budgets. In practice, this means starting from a safe baseline, monitoring costs as the agent learns, and adapting its behavior whenever it approaches a limit—so progress never comes at the expense of safety. Further, they need to adapt to the changing environment as well. The result is dependable, data-driven control that can be deployed in robotics, networks, and energy systems, delivering strong performance and compliance from day one. We are mostly interested in developing algorithms with provable performance guarantees and these works have been published in NeurIPS, ICLR, ICML, and AISTATS.
Applications:
Wireless Communication: We have applied this framework in finding optimal beam angles while satisfying interference constraints. We achieved the first provably efficient approach which is fast and yet achieves sublinear regret and violation even when the environment varies. The demo is here https://www.dropbox.com/scl/fi/ogv10le30cck1goxh3voa/ris_bandit_demo.mp4?rlkey=nd8orz1sktmnvdk58udbd5b6v&e=2&dl=0
Robot Navigation: We are developing the framework to apply the above algorithms in robot navigation under a cluttered environment. The demo will be up soon.
Robust and Risk-constrained RL
We develop frameworks to achieve RL that provide guarantees on safety even when the real environment differs from the simulated environment. First, we develop framework for robust constrained MDP (RCMDP) with provable feasibility and sub-optimality guarantee even when there is a mismatch in the model. We also consider risk-constrained MDP where we want to maximize the reward while keeping the chance of costly mistakes below a clear, user-set threshold. Practically, that means our algorithms don’t just optimize “on average”—they actively plan for unlikely but high-impact events (the crash, the blackout, the missed dose) and stay within safety limits as they learn. Our work makes this usable in the real world: we set transparent safety targets, adapt them to changing conditions, and provide lightweight certificates that show when a policy is safe to deploy. We’ve applied these ideas to robotics, communication networks, and energy systems, where reliability matters as much as speed. The result is AI that earns trust—strong performance, with safety built in from day one.
LLM-Driven Automated Hardware Design
We are developing LLM-assisted codes and circuits for hardwares. The main goal is to automate the hardware design process. We found that the state-of-the-art LLMs struggle in developing code for the Hardwares. We are developing the first RLHF-based fine-tuning for hardware and circuit design.