Program

The full-day tutorial will consist of four blocks.

Details on each of these blocks follow below.

Morning 1: Introduction and fundamentals of reinforcement learning and MARL.

The tutorial starts with an introduction of the program and speakers. In addition, we will quickly survey the background knowledge of participants. The remainder of the first block is spent on the underlying concepts of Reinforcement Learning (RL) and Game Theory (GT), leading up to the main challenges of Multi-Agent Reinforcement Learning (MARL). Within the topic of RL, we discuss psychological inspiration of RL models, Markov decision processes, policy and value iteration (including algorithms such as Learning Automata and Q-learning), how one of the main advantages of RL, i.e. the incorporation of domain knowledge, can be achieved through reward shaping and, finally, some proofs of convergence in the single-agent scenario. GT provides tools for the analysis of strategic interaction between multiple agents. We discuss (normal form) games, best response sets, Nash equilibria, and Pareto optimality. Given the necessary background of RL and GT, we can now proceed to discuss the main challenges of MARL. In essence, the presence of multiple agents leads to partial observability and a non-stationary environment. Since most proofs of convergence of the single-agent scenario are based on full observability and a stationary environment, this implies that they no longer hold for MARL. This motivates the need for a new framework with which to analyze the learning process in MARL.

Morning 2: Scaling single agent reinforcement learning to multi-agent settings

This part of the tutorial will explain how single agent reinforcement learning can still be applied in the context of multi-agent systems in which agents only rarely interact with each other. We discuss various algorithms that aim to learn using single agent techniques whenever appropriate and only switch to multi-agent approaches in situations where agents are influencing each other. These situations, are called Sparse Interactions. This avoids the requirement of having full observability during the entire learning process. Finally, techniques such as reward shaping to allow faster learning will also be discussed in this MAS setting.

Afternoon 1: A taxonomy of multi-agent learning and applications to stochastic games

A number of authors have independently arrived at a new approach to analyze MARL. They exploit the link between reinforcement learning and continuous systems derived from the learning process (i.e., replicator dynamics or projection dynamics), which are studied in Evolutionary Game Theory (EGT). The study of the learning dynamics reveals that a number of policy and value iterators resemble each other in the internal learning process while the actual implementation may be substantially different. This analysis culminates in a taxonomy of MARL algorithms for single-state environments, which groups algorithms by convergence behavior and information requirements. In the second part of this afternoon session we expand the link between evolutionary game theory and multi agent reinforcement learning to multi-state games. This more general framework yields additional challenges to provide theoretical guarantees on convergence behavior. At the same time, it bridges a gap between abstract models with theoretical guarantees and practical applications.

Afternoon 2: Demonstrations and discussion

During the final part of this tutorial, reinforcement learning algorithms are demonstrated in different multi-agent settings, including abstract games and real world applications. Each demonstration is discussed with participants and accompanied by an explanation of the domain specific challenges and solutions. The outcome of this discussion positions each demonstration within the theoretical framework using the analytical tools that have been presented earlier.

The CoMo Simulation Applets are available online and give an intuitive impression of reinforcement learning.