Reinforcement learning theory

State-of the-art reinforcement learning (RL) algorithms such as AlphaZero use lookahead, which is typically implemented using Monte Carlo Tree Search (MCTS). As the name suggests, lookahead simply means looking ahead several steps when computing the policy to be used. The fact that an H-step lookahead provides an O(alpha^H), where alpha  is the discount factor, approximate solution to the optimal policy is a somewhat trivial and well-known statement. What I have shown in my research is a much stronger result: I have shown that lookahead leads to convergent learning algorithms while the same algorithms may diverge in the absence of lookahead. I have demonstrated these results for three different classes of RL algorithms: modified policy iteration with linear value function approximation, Monte Carlo exploring starts, and policy iteration for zero-sum Markov games. My results have resulted in the resolution of two longstanding open problems. One of these results has been selected as one of four finalists for the INFORMS Applied Probability Society Best Student Paper award. I have also shown that lookahead can be efficiently implemented in the widely studied class of linear MDPs.

Currently, I am working with a team of researchers at NVIDIA Research to develop theoretical underpinnings for reinforcement learning with human feedback (RLHF) with potential applications in autonomous vehicles and healthcare.  

Links: https://arxiv.org/abs/2301.09709, https://arxiv.org/abs/2109.13419, https://arxiv.org/abs/2303.09716, https://arxiv.org/pdf/2210.07338.pdf,  https://arxiv.org/abs/2102.00030

Electricity market design

A restructuring of electricity markets in the US is currently underway because of increased use of renewable resources and other such distributed energy resources (DERs). The present day electricity market structures do not provide sufficient support for the volatility of renewable resources since they do not incorporate reactive power; this has led to power outages and potentially unnecessary costs of power. I first consider two locational marginal pricing schemes derived from an economic dispatch problem with AC power flow equations that define a non-convex feasible set. My research studies properties of the pricing  mechanisms that are relevant to electricity market operations. I extend the results to incorporate radial distribution grids in a distribution locational marginal (DLMP) pricing mechanism, which are particularly relevant in the case of DERs. 

Links: https://arxiv.org/pdf/1910.10673.pdf, https://ieeexplore.ieee.org/document/9087752

Privacy for networked games

Aggregate games are non-cooperative games in which a player’s payoff or cost depends on her own actions and the sum-total of the actions taken by other players. Aggregate games are often potential games and a pure-strategy Nash equilibrium can be guaranteed to exist. In my research, a distributed algorithm to compute an equilibrium in aggregate games where players communicate over a fixed undirected network is presented. The algorithm exploits correlated perturbation to obfuscate information shared over the network. The algorithm does not reveal private information of players to an honest-but-curious adversary who monitors several nodes in the network. In contrast with differential privacy based algorithms, the method does not sacrifice accuracy of equilibrium computation to provide privacy guarantees. 

Links: https://www.sciencedirect.com/science/article/pii/S2405896320315196