Multi-Agent Diagnostics for Robustness via Illuminated Diversity
Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2024
[Paper] [Code]
Abstract
In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
MADRID
MADRID incorporates MAP-Elites, a simple but powerful Quality Diversity approach, to systematically explore the vast space of adversarial settings. By discretising the search space, MADRID iteratively performs selection, mutation, and evaluations steps, endlessly refining and expanding the repertoire of high-performing adversarial scenarios within its archive
Environment
We evaluate MADRID on one of the most challenging multi-agent domains, namely the fully decentralised 11 vs 11 variation of Google Research Football (GRF). GRF represents a unique combination of characteristics not present in other RL environments, namely multi-agent cooperation (within each team), competition (between the two teams), sparse rewards, large action and observation spaces, and stochastic dynamics.
Operating on a discretised grid with an added dimension for reference policies, MADRID archives environment variations characterised by representative features, e.g., (x, y) coordinates of the ball position in football. During each iteration, MADRID mutates a selected level, computes regret using its associated reference policy, and reincorporates levels with higher regret into the archive, effectively generating diverse collection adversarial levels.
Results
We use MADRID to target TiZero, a recent multi-agent RL approach that learned to "master" the fully decentralised variation of GRF from scratch for the first time, using a hand-crafted curriculum, reward shaping, and self-play. Experimentally, TiZero has shown impressive results and outperformed previous methods by a large margin after an expensive training lasting 45 days on a large-scale distributed training infrastructure.
Our extensive evaluations reveal diverse settings where TiZero exhibits poor performance, where weaker policies, such as earlier checkpoints of TiZero, can outperform it approximately 70% of times.
We analyse the evolution of MADRID’s archive for a specific reference policy, illustrating its search process over time. Initially, the grid is sparsely filled with low-regret levels. However, as terations progress, MADRID generates high-regret levels that progressively populate the entire grid.This shows that MADRID can discover high-regret levels anywhere on the football field.
25 iterations
200 iterations
1000 iterations
5000 iterations
Adversarial Examples
Below a list of adversarial examples found by MADRID on Google Research Football domain for the state-of-the-art model, TiZero. On the left TiZero's original policy showing suboptimal behaviour. On the right the reference policy, which is weaker on the full game, exploits TiZero.
Hesitation before shooting
TiZero
TiZero hesistates before shooting just enough for the goalkeeper to steal the ball from it
Reference Policy
The reference policy shoots immediately and scores
Better defence
TiZero
TiZero's defender does not run to defend against the opponent, and instead lets attacker plenty of space to shoot and scores.
Reference Policy
The reference policy's defender runs towards the striked and seizes the ball before the opponent has time to shoot.
Better passing
TiZero
TiZero does not recognize when it should pass the ball to better positioned teammate, instead runs towards the goal, shoots and misses.
Reference Policy
The reference policy passes the ball to a better positioned teammate, who scores
Confused behaviour
TiZero
TiZero agents somehow gets confused and runs aimlessly up and down with the ball.
Reference Policy
The reference policy initiates a fast counterattack, managing to score.
Erroneous team movement
TiZero
The team moves towards the left, tricking the solitary attacker who does not shoot despite being in a good position.
Reference Policy
The reference policy still manages to get into a good position and shoot.
Better shooting position
TiZero
TiZero's agent goes to shoot from a narrow angle and gets blocked by the goalkeeper
Reference Policy
The reference policy finds a better shooting angle and scores.
Shooting while running
TiZero
TiZero's agent shoots while sprinting and gets blocked by the goalkeeper.
Reference Policy
The reference policy runs slower, and is thus more precise when shooting, and scores.
Unforced own goals
TiZero
TiZero's agent unexplaibaly shoots towards its own goal.
Reference Policy
From the same position, the reference policy manages to counter attack and score.
Offside
TiZero
TiZero passes the ball to a player clearly in offside
Reference Policy
The reference policy doesn't pass the ball, instead it runs towards the goal and scores
Slow paced running
TiZero
TiZero's fast running doesn't manage to score
Reference Policy
A slow pace reference policy manages to dribble through TiZero's defenders, who are not used to such behaviour
Not realising a goal scoring opportunity
TiZero
TiZero doesn't realize it can get the ball before the goal keeper
Reference Policy
The reference policy runs towards the ball, getting it before the goal keeper, and scores