Multi-Agent Learning tutorial AAMAS/ICML 2018

by D. Balduzzi, T. Graepel, E. Hughes, M. Jaderberg, J. Perolat, K. Tuyls

The tutorial covers topics in learning in multi-agent systems (MAL). We introduce participants to the very basics, assuming elementary knowledge of single-agent reinforcement learning. We revise some game theoretic concepts and then introduce multi-agent learning, which is non-stationary and reflects a moving target problem, considering several paradigms. We continue by introducing the Markov games multi-agent framework, some elementary game-theoretic solution concepts, and links with replicator dynamics from evolutionary game theory. We review some important algorithms from the ‘pre-deep RL’ period, after which we transition to complex systems and delve into deep multi-agent reinforcement learning. We then provide an overview of  recent results from social sequential dilemma’s (SSDs); a newly introduced algorithm (Symplectic Gradient Adjustment, Best Paper Runner Up award ICML 2018) that considers interacting losses arising in games played between deep neural networks, such as GANs; we continue with introductions to the AlphaGo Zero project, an extension of AlphaGo where agents learn from scratch, via pure self play, and we conclude with the emergence of complex multiagent behaviours in the Capture the Flag domain.

After the tutorial, participants should have a basic understanding of the area, knowledge of some of the early and more recent results, an understanding of the state of the art, and know how to potentially enter the field.


  • Introduction to multi-agent learning, various paradigms, some game theoretic notions and links between reinforcement learning and evolutionary dynamics

  • Basics in Markov Games and normal form games, and how to learn from batch data in these games

  • From learning in stateless games to deep RL in social sequential dilemmas, inequity averse agents

  • Mechanics of interacting losses in differentiable n-player games

  • Multi-agent learning in complex domains: AlphaGo, AlphaZero, Capture The Flag



David Balduzzi received a PhD in algebraic geometry from the University of Chicago in 2006. He subsequently worked on computational neuroscience and machine learning at the University of Wisconsin, MPI for Intelligent Systems, ETH Zurich and DeepMind.

Thore Graepel is a research group lead at Google DeepMind and holds a part-time position as Chair of Machine Learning at University College London. He studied physics at the University of Hamburg, Imperial College London, and Technical University of Berlin, where he also obtained his PhD in machine learning in 2001. He held postdoctoral research positions at ETH Zurich and Royal Holloway College, University of London, before joining Microsoft Research in Cambridge in 2003, where he co-founded the Online Services and Advertising group. Major applications of Thore’s work include Xbox Live’s TrueSkill system for ranking and matchmaking, the AdPredictor framework for click-through rate prediction in Bing, and the Matchbox recommender system which inspired the recommendation engine of Xbox Live Marketplace. More recently, Thore’s work on the predictability of private attributes from digital records of human behaviour has been the subject of intense discussion among privacy experts and the general public. Thore’s current research interests include probabilistic graphical models and inference, reinforcement learning, games, and multi-agent systems. He has published over one hundred peer-reviewed papers, is a named co-inventor on dozens of patents, serves on the editorial boards of JMLR and MLJ, and is a founding editor of the book series Machine Learning & Pattern Recognition at Chapman & Hall/CRC. At DeepMind, Thore has returned to his original passion of understanding and creating intelligence, and recently contributed to creating AlphaGo, the first computer program to defeat a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Edward Hughes studied mathematics at Cambridge University and received a PhD in theoretical physics from Queen Mary University of London in 2017. He currently works on aspects of multi-agent social learning and dynamics in mixed-motivation environments.

Max Jaderberg is a senior research scientist at DeepMind in machine learning. He previously co-founded Vision Factory which was acquired by Google in 2014, and completed his PhD at the Visual Geometry Group, University of Oxford under the supervision of Prof. Andrew Zisserman and Prof. Andrea Vedaldi. His main interests are in artificial intelligence, deep learning, and reinforcement learning.

Julien Perolat received his PhD on reinforcement learning in Games from the University of Lille in 2017. He now works as a research scientist at DeepMind.

Karl Tuyls (FBCS) is a research scientist at DeepMind and is also a professor of Computer Science at the University of Liverpool, UK. Previously, he held positions at the Vrije Universiteit Brussel, Hasselt University, Eindhoven University of Technology, and Maastricht University. At the University of Liverpool he was director of research of the school of Electrical Engineering & Electronics and Computer Science. He founded and led the smARTLab robotics laboratory since 2013 ( Prof. Tuyls has received several awards with his research, amongst which: the Information Technology prize 2000 in Belgium, best demo award at AAMAS'12, winner of the German Open robocup@work competitions in 2013 and 2014, world champion of the RoboCup@Work competitions in 2013 and 2014, winner of the RoCKIn@work competition in 2015. Furthermore, his research has received substantial attention from national and international press and media ( He is a fellow of the British Computer Society (BCS), is on the editorial board of the Journal of Autonomous Agents and Multi-Agent Systems, and is editor-in-chief of the Springer briefs series on Intelligent Systems. Prof. Tuyls is also a member of the board of directors of the International Foundation for Autonomous Agents and Multiagent Systems (