Multi-Agent Diagnostics for Robustness via Illuminated Diversity

Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024
[Paper]

Abstract

In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.

MADRID

MADRID incorporates MAP-Elites, a simple but powerful Quality Diversity approach, to systematically explore the vast space of adversarial settings. By discretising the search space, MADRID iteratively performs selection, mutation, and evaluations steps, endlessly refining and expanding the repertoire of high-performing adversarial scenarios within its archive

Environment

We evaluate MADRID on one of the most challenging multi-agent domains, namely the fully decentralised 11 vs 11 variation of Google Research Football (GRF). GRF represents a unique combination of characteristics not present in other RL environments, namely multi-agent cooperation (within each team), competition (between the two teams), sparse rewards, large action and observation spaces, and stochastic dynamics.

Operating on a discretised grid with an added dimension for reference policies, MADRID archives environment variations characterised by representative features, e.g., (x, y) coordinates of the ball position in football. During each iteration, MADRID mutates a selected level, computes regret using its associated reference policy, and reincorporates levels with higher regret into the archive, effectively generating diverse collection adversarial levels.

Results

We use MADRID to target TiZero, a recent multi-agent RL approach that learned to "master" the fully decentralised variation of GRF from scratch for the first time, using a hand-crafted curriculum, reward shaping, and self-play. Experimentally, TiZero has shown impressive results and outperformed previous methods by a large margin after an expensive training lasting 45 days on a large-scale distributed training infrastructure.

Our extensive evaluations reveal diverse settings where TiZero exhibits poor performance, where weaker policies, such as earlier checkpoints of TiZero, can outperform it approximately 70% of times.

We analyse the evolution of MADRID’s archive for a specific reference policy, illustrating its search process over time. Initially, the grid is sparsely filled with low-regret levels. However, as terations progress, MADRID generates high-regret levels that progressively populate the entire grid.This shows that MADRID can discover high-regret levels anywhere on the football field.

25 iterations

200 iterations

1000 iterations

Reference Policy

The reference policy runs towards the ball, getting it before the goal keeper, and scores

Multi-Agent Diagnostics for Robustness via Illuminated Diversity

Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim RocktäschelInternational Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024[Paper]

Abstract

MADRID

Environment

Results

Adversarial Examples

Hesitation before shooting

TiZero

Reference Policy

Better defence

TiZero

Reference Policy

Better passing

TiZero

Reference Policy

Confused behaviour

TiZero

Reference Policy

Erroneous team movement

TiZero

Reference Policy

Better shooting position

TiZero

Reference Policy

Shooting while running

TiZero

Reference Policy

Unforced own goals

TiZero

Reference Policy

Offside

TiZero

Reference Policy

Slow paced running

TiZero

Reference Policy

Not realising a goal scoring opportunity

TiZero

Reference Policy

Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024
[Paper]