Multi-Agent Diagnostics for Robustness via Illuminated Diversity

Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024
[Paper] [Code]

Abstract

In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.

MADRID

MADRID incorporates MAP-Elites, a simple but powerful Quality Diversity approach, to systematically explore the vast space of adversarial settings. By discretising the search space, MADRID iteratively performs selection, mutation, and evaluations steps, endlessly refining and expanding the repertoire of high-performing adversarial scenarios within its archive

Environment

We evaluate MADRID on one of the most challenging multi-agent domains, namely the fully decentralised 11 vs 11 variation of Google Research Football (GRF). GRF represents a unique combination of characteristics not present in other RL environments, namely multi-agent cooperation (within each team), competition (between the two teams), sparse rewards, large action and observation spaces, and stochastic dynamics.

Operating on a discretised grid with an added dimension for reference policies, MADRID archives environment variations characterised by representative features, e.g., (x, y) coordinates of the ball position in football. During each iteration, MADRID mutates a selected level, computes regret using its associated reference policy, and reincorporates levels with higher regret into the archive, effectively generating diverse collection adversarial levels.

Results

We use MADRID to target TiZero, a recent multi-agent RL approach that learned to "master" the fully decentralised variation of GRF from scratch for the first time, using a hand-crafted curriculum, reward shaping, and self-play. Experimentally, TiZero has shown impressive results and outperformed previous methods by a large margin after an expensive training lasting 45 days on a large-scale distributed training infrastructure. 

Our extensive evaluations reveal diverse settings where TiZero exhibits poor performance, where weaker policies, such as earlier checkpoints of TiZero, can outperform it approximately 70% of times.


We analyse the evolution of MADRID’s archive for a specific reference policy, illustrating its search process over time. Initially, the grid is sparsely filled with low-regret levels. However, as terations progress, MADRID generates high-regret levels that progressively populate the entire grid.This shows that MADRID can discover high-regret levels anywhere on the football field.

25 iterations

200 iterations

1000 iterations

5000 iterations

Adversarial Examples

Below a list of adversarial examples found by MADRID on Google Research Football domain for the state-of-the-art model, TiZero. On the left TiZero's original policy showing suboptimal behaviour. On the right the reference policy, which is weaker on the full game, exploits TiZero. 

Hesitation before shooting

TiZero

TiZero hesistates before shooting just enough for the goalkeeper to steal the ball from it

Reference Policy

The reference policy shoots immediately and scores

Better defence

TiZero

TiZero's defender does not run  to defend against the opponent, and instead lets attacker plenty of space to shoot and scores.

Reference Policy

The reference policy's defender runs towards the striked and seizes the ball before the opponent has time to shoot.

Better passing

TiZero

TiZero does not recognize when it should pass the ball to better positioned teammate, instead runs towards the goal, shoots and misses.

Reference Policy

The reference policy passes the ball to a better positioned teammate, who scores

Confused behaviour

TiZero

TiZero agents somehow gets confused and runs aimlessly up and down with the ball.

Reference Policy

The reference policy initiates a fast counterattack, managing to score.

Erroneous team movement

TiZero

The team moves towards the left, tricking the solitary attacker who does not shoot despite being in a good position.

Reference Policy

The reference policy still manages to get into a good position and shoot.

Better shooting position

TiZero

TiZero's agent goes to shoot from a narrow angle and gets blocked by the goalkeeper

Reference Policy

The reference policy finds a better shooting angle and scores.

Shooting while running

TiZero

TiZero's agent shoots while sprinting and gets blocked by the goalkeeper.

Reference Policy

The reference policy runs slower, and is thus more precise when shooting, and scores.

Unforced own goals

TiZero

TiZero's agent unexplaibaly shoots towards its own goal.

Reference Policy

From the same position, the reference policy manages to counter attack and score.

Offside

TiZero

TiZero passes the ball to a player clearly in offside

Reference Policy

The reference policy doesn't pass the ball, instead it runs towards the goal and scores

Slow paced running

TiZero

TiZero's fast running doesn't manage to score 

Reference Policy

A slow pace reference policy manages to dribble through TiZero's defenders, who are not used to such behaviour

Not realising a goal scoring opportunity

TiZero

TiZero doesn't realize it can get the ball before the goal keeper

Reference Policy

The reference policy runs towards the ball, getting it before the goal keeper, and scores