Multi-agent reinforcement learning (MARL) is an important and fundamental topic within agent-based research. After giving successful tutorials on this topic at EASSS 2004 (the European Agent Summer School), ECML 2005, ICML 2006, EWRL 2008 and AAMAS 2009/2010, with different collaborators, we now offer participants a thoroughly revised, updated tutorial, focusing on the theoretical as well as practical aspects of MARL. 

Participants will be taught the basics of single-agent reinforcement learning (RL) and the associated theoretical convergence guarantees, related to Markov Decision Processes (MDP). We will then outline how these guarantees are lost in a setting where multiple agents learn and introduce a framework, based on game theory and evolutionary game theory (EGT), that allows thorough analysis and prediction of the dynamics of multi-agent learning. We also discuss a fundamental question that designers of multi-agent learning algorithms are confronted with, i.e., what is it we want the agents to learn? Fairness is shown to be an important consideration here, especially in case systems are designed to collaborate with human agents. Finally, the last part of the tutorial will focus on reward-free multi-agent scenarios, in which the agents learn a task by observing other agents perform it. We introduce several social learning mechanisms that have been gathering increasing attention and that may lead to different outcomes than individual RL.

The tutorial is offered in two half-day parts, given on one day (3 May 2011). Participants can register for each separate part (at the cost of a half-day tutorial), or for both parts (at the cost of a full-day tutorial). 

For any inquiries, please contact either the AAMAS Tutorial chair or the corresponding lecturer at steven dot dejong at maastrichtuniversity dot nl.

Target audience

The tutorial aims at researchers that are faced with a multi-agent setting and consider to devise an adaptive solution.

Multi-agent systems are receiving increasing attention by the research community. This can be observed from the accepted full articles at last year's AAMAS (2010), of which about 65% are dealing with multi-agent systems. Multi-agent systems are inherently complex if not impossible to control by design, which explains a keen interest of the community in adaptive multi-agent systems. Indeed, within the topic of multi-agent systems, about 40% of the accepted full articles at AAMAS 2010 concern adaptive multi-agent systems. 


Part I: Algorithms and Analysis Methods (3 May 2011, Morning)

  1. Introduction to part I (K. Tuyls)
  2. Fundamentals of Reinforcement Learning (RL), Game Theory (GT), and Multi-Agent Reinforcement Learning (MARL). (A. Nowé)
  3. Infinitesimal Gradient Ascent (IGA), Evolutionary Game Theory (EGT), and the mapping between MARL and EGT. (M. Kaisers)

Part II: Learning with and from Other Agents (3 May 2011, Afternoon)

  1. Introduction to part II (K. Tuyls)
  2. Determining what agents need to learn. A hands-on session on (the need for) fairness. Applications of multi-agent learning. (S. de Jong)
  3. Social learning: learning by imitating others'  successful behavior. (F. Melo)

Prior knowledge

The first part of the tutorial assumes no prior knowledge specific to MARL or (E)GT. The second part assumes basic understanding of multi-agent learning (as provided in the first part), however, all topics are largely self-contained and complementary to each other.

Learning goals

RL is one of the most popular approaches to single-agent learning, because it is explicitly agent-centric, it is founded in psychological models and it provides convergence guarantees under the proper assumptions. RL has been applied to multi-agent settings with promising results. However, the theoretical convergence guarantees based on classical proofs are lost, since common assumptions such as a Markovian environment are violated. A new perspective and theoretical framework for analyzing and improving MARL, including convergence guarantees, are presented in the proposed tutorial. Participants receive the necessary knowledge to apply the analysis to their specific multi-agent setting in order to devise and refine solutions based on MARL (Part I). In addition, participants will develop a crisp definition of learning goals within MARL and will understand the fundamental differences between individual and social learning (Part II).

Details of Part I: Algorithms and Analysis Methods

Part I.1: Introduction

The tutorial starts with an introduction of the program and speakers. In addition, we will quickly survey the background knowledge of participants.

Part I.2: Fundamentals

The remainder of the first block is spent on the underlying concepts of Reinforcement Learning (RL) and Game Theory (GT), leading up to the main challenges of Multi-Agent Reinforcement Learning (MARL). Within the topic of RL, we discuss psychological inspiration of RL models, Markov decision processes, policy and value iteration (including algorithms such as Learning Automata and Q-learning), and some proofs of convergence in the single-agent scenario. GT provides tools for the analysis of strategic interaction between multiple agents. We discuss (normal form) games, best response sets, Nash equilibria, and Pareto optimality. Given the necessary background of RL and GT, we can now proceed to discuss the main challenges of MARL. In essence, the presence of multiple agents leads to partial observability and a non-stationary environment. Since most proofs of convergence of the single-agent scenario are based on full observability and a stationary environment, this implies that they no longer hold for MARL. This motivates the need for a new framework with which to analyze the learning process in MARL.

Part I.3 A new framework for analyzing and improving MARL

A number of authors have independently arrived at a new approach to analyze MARL. They exploit the link between RL and a continuous system derived from the learning process. The first example of this approach that we discuss is Infinitesimal Gradient Ascent (IGA). Subsequently, the topic of Evolutionary Game Theory (EGT) is introduced. In contrast to traditional GT, which assumes agents to be able and willing to compute the best response, EGT assumes that agents are subdued to pressure of natural selection to perform well. The concepts of population models, the Replicator Dynamics (RD) and Evolutionary Stable Strategies (ESS) are explained. The RD describe the evolutionary change in a population as a continuous system and relate to genetic operators such as mutation and selection. The ESS describe states of the population which are, when perturbed to a small amount, re-established by the evolutionary process. The RD have first been linked to multi-agent policy iteration, in particular to multi-agent applications of Learning Automata. Recently, a link to value iteration has been established, relating varieties of Q-learning to an extension of the RD. It revealed that a number of policy and value iterators resemble each other in the internal learning process while the actual implementation may be substantially different. Finally, we delineate how this framework can be used to analyze and even improve MARL algorithms.

Details of Part II: Learning with and from Other Agents

Part II.1 Introduction

Once again, we start by quickly introducing the program and speakers, and establish the present knowledge of the participants.

Part II.2 What do agents need to learn?

With single-agent learning, we can generally identify the objective of the agent involved, e.g., it needs to learn a rationally optimal solution. With multi-agent learning, determining what is optimal becomes a non-trivial task. The individually rational solution (i.e., a solution an agent learns without taking into account other agents) may not be optimal, since neglecting other agents may have undesirable side-effects. This is most prominently the case in social dilemmas, which are studied in (E)GT by means of games such as the well-known Prisoners' Dilemma and Ultimatum Game. Social dilemmas are shown to be very common in applications of multi-agent systems, especially (but not exclusively) in the presence of human agents. In the tutorial, participants get hands-on experience in learning to play social-dilemma games. We then discuss possible answers to the question "What do agents need to learn?" taken from diverse fields of research, such as welfare economics, behavioral economics, and statistical physics. Fairness is shown to be a central concept here. We give a number of applications for multi-agent systems that consider fairness as part of their optimization procedure. With the information taught in this part of the tutorial, participants will be able to measure and influence the performance of multi-agent systems in ways that reflect human preferences.

Part II.3 Social learning

In this last part of the tutorial, we discuss some recent work on social learning in artificial agents. In social learning, a learner uses information provided by an expert to acquire new tasks or improve its own learning. Social learning allows for cultural transfer of knowledge in a fast and reliable manner. There has been a significant amount of work on social learning in computer science, particularly in robotics. Social learning mechanisms allow for a natural way of having non-technical users program complex devices (such as robots) by demonstration. In this part of the tutorial, we will discuss several social learning mechanisms that are closely related to RL and can be formalized using the same tools. We pay special attention to inverse RL, that addresses the problem of inferring a task to be executed from a demonstration from an expert.


Steven de Jong (corresponding presenter) obtained a Ph.D. in Artificial Intelligence, with a thesis entitled ''Fairness in Multi-Agent Systems", at Maastricht University, The Netherlands, in 2009. Afterwards, he worked as a teacher there, as well as a post-doctoral researcher at the Vrije Universiteit Brussel, Belgium. Currently he continues his post-doctoral work in Maastricht. His current research focuses on using human-inspired fairness and social network models to predict the emergence of conflict. He also has a large interest for swarm robotics. Details about his further research interests and publications may be found at his webpage, Steven has been lecturing in previous editions of this tutorial in 2009 and 2010. Also, he has been lecturing at his department for years, and has been an invited lecturer at many occasions.

Michael Kaisers graduated from Maastricht with a B. Sc. in Knowledge Engineering in 2007 on ''Reinforcement Learning in Multi-agent Games'', and a M. Sc. in Artificial Intelligence in 2008 on ''Games and Learning in Auctions''. In both cases, he earned the honor summa cum laude, additionally abbreviating the three-years bachelor program to two years, and complementing his master program by an extra-curricular four-month research visit to Simon Parsons at Brooklyn College, New York City. In a nationwide competition, the Netherlands Organization for Scientific Research (NWO) awarded him a TopTalent 2008 PhD grant for his proposal ''Multi-agent Learning in Auctions''. The findings of his PhD research have extended and solidified the link between evolutionary game theory and reinforcement learning, in particular considering variations of Q-learning. He intensified his international research network through a three-month research visit to Michael Littman at Rutgers, State University of New Jersey, and gave presentations at various workshops and conferences. His website is

Francisco Melo is currently a Senior Researcher at the Intelligent Agents and Synthetic Characters Group (GAIPS) of INESC-ID, in Portugal. He completed his PhD in Electrical and Computer Engineering in 2007 at the Instituto Superior Tecnico (IST), in Lisbon, Portugal. In his thesis, he developed and analyzed RL algorithms for cooperative navigation tasks. Prior to his current position with GAIPS/INESC-ID, he held appointments as a Post-doctoral Fellow at the School of Computer science, Carnegie Mellon University (CMU), and as a short-term researcher at the Vision Lab (VisLab), IST, where he worked on the application of machine learning in general (and RL in particular) to developmental robotics.  His current research focuses on theoretical aspects of RL, multi-agent systems and developmental robotics. He has published several papers on general aspects of RL (AAMAS, ICML, COLT, ECC), planning and learning in multi-agent systems (AAMAS, ICRA) and developmental robotics (ICRA, IROS, AISB). Details about his research interests and publications can be found in his webpage,

Ann Nowé graduated from the University of Ghent in 1987, where she studied mathematics with optional courses in computer science. Then she became a research assistant at the University of Brussels where she finished her PhD in 1994 in collaboration with Queen Mary and Westfield College, University of London. Currently, she is a professor at the Vrije Universiteit Brussel both in the Computer Science Department of the Faculty of Sciences as in the Computer Science group of the Engineering Faculty. Her research interests include Multi-Agent Learning (MAL) and Reinforcement Learning (RL). Within MAL, she focusses on the coordination of agents with limited communication, social agents learning fair policies and the relationship between Learning Automata and Evolutionary Game Theory. Within RL she primarily looks at conditions for convergence to optimality, the relationship with Dynamic Programming, and the application to non-stationary problems and distributed multi-agent systems. Visit her webpage at

Karl Tuyls works as an Associate Professor in Artificial Intelligence at the department of Knowledge Engineering, Maastricht University (The Netherlands) where he leads a research group on swarm robotics and learning in multi-agent systems (Maastricht Swarmlab). Previously, he held positions at the Vrije Universiteit Brussel (Belgium), Hasselt University (Belgium) and Eindhoven University of Technology (The Netherlands). His main research interests lie at the intersection of Reinforcement Learning, Multi-Agent or Robot Systems and (Evolutionary) Game Theory. He was a (co)-organizer of several events on this topic like the European workshop on Multi-agent systems (EUMAS'05), the Belgian-Dutch conference on AI (BNAIC'05 and '09), and workshops on adaptive and learning agents (EGTMAS'03, LAMAS'05, ALAMAS'07, ALAg & ALAMAS'08, ALA'09). In 2000 he has been awarded the Information Technology prize in Belgium and in 2007 he was elected best junior researcher (TOPDOG) of the faculty of humanities and sciences, Maastricht University, the Netherlands. Tuyls is associate editor of two journals and  has published in absolute top journals in his research area such as Artificial Intelligence, Journal of Artificial Intelligence Research, Theoretical Biology, Autonomous Agents and Multi-Agent Systems, Journal of Machine Learning Research etc. His webpage is located at