Theoretical foundation of RL algorithms

Reinforcement learning (RL) is a general paradigm for learning optimal policies in stochastic control problems based on simulations. Combined with powerful function approximation such as neural networks, (deep) reinforcement learning have been recognized as promising solutions in addressing challenging practical problems; including robotics and autonomous driving. However, theoretical guarantees of RL are less understood preventing the widespread use of this framework for solving real-world problems.

My research focuses on understanding the finite-time performance (sample complexity) of reinforcement learning algorithms, a fundamental and important problem in this area. More importantly, how can we develop more efficient algorithms given the practical constraints? Here are some examples:

1. Using acceleration in reinforcement learning algorithms (videos on the left) [pdf]

2. Finite-time analysis of RL algorithms [pdf], [pdf], [pdf]

3. Two-time-scale stochastic approximation algorithms [pdf], [pdf], [pdf]

Decentralized algorithms for multi-agent RL

In multi-agent reinforcement learning, a group of agents cooperatively solve a common task. This multi-agent framework is inspired by applications where a single agent is unwieldy or impossible to complete a given task. One example would be a network of mobile robots with limited resources moving a heavy object from one area to another area (video on the bottom left). Another example is to deploy a group of drones exploring different parts of a huge area (video on the top left). In these applications, each agent has limited communication and computation capabilities, necessitating the development of distributed algorithms. My research focuses on studying distributed/decentralized reinforcement learning algorithms over multi-agent systems with theoretical guarantees. Here are some examples:

1. Decentralized policy gradient approach to multi-task reinforcement learning [pdf]

2. Decentralized stochastic approximation with applications in multi-agent and multi-task learning [pdf]

3. Finite-time performance of distributed temporal difference learning [pdf]


Decentralized optimization

The last decade has seen a technological revolution driven by the rapid development of low-cost sensors, smart devices, and communication networks. Such technological advances have enabled data-driven decision making in large-scale complex multi-agent systems. Prominent examples include autonomous robotic networks, IoTs, and intelligent transportation systems. The key challenge in these systems is in handling the vast quantities of information while satisfying limited computation and communication shared between the agents. Among potential approaches, distributed information processing, which is not only amenable to low-cost implementation but can also be implemented in real time, has been recognized as an important approach to address this challenge.

The notable contribution of my research is to develop a novel two-time-scale decentralized stochastic approximation with theoretical guarantees for solving large-scale optimization problems over a network of agents. The proposed method is shown to be robust under both communication delays and quantization, which are the bottleneck of any distributed systems. See the papers below for more details.

1. Decentralized stochastic approximation for communication delays [pdf] and quantization [pdf]. Fast rates with adaptive quantization [pdf]

2. Decentralized optimization over cluster networks [pdf]

3. Distributed optimization under the presence of Byzantine malicious agents [pdf]