RODE

RODE: Learning Roles to Decompose Multi-Agent Tasks

Rapid Role-Based Knowledge Transfer

Table 1: The win rates on unseen maps of the policy learned on corridor, where 6 ally Zealots face 24 Zerglings. We do not train policies on new maps.

Transfer the policy learned on corridor to unseen tasks

Figure 1: Map corridor: 6 Zealots are faced with 24 Zerglings

A bonus of our method is that we can transfer the learned policies to tasks with new actions and (or) different numbers of agents. The learned policy can still win 50% of the games on unseen maps with three times the number of agents (shown left).

Video 1: Game replays on an unseen map where 18 ally Zealots face 36 Zerglings. The policy is trained on corridor, where 6 ally Zealots face 24 enemy Zerglings. We do not train policies on this unseen map.

2. Role Dynamics

Figure 2: Map corridor

6 Zealots face with 24 Zerglings.

Figure 3: Performance of RODE

Video 2: Dynamics of role selection learned by RODE. Role 0 and Role 2 motivate agents to explore the state space in a certain direction, which helps them learn several important strategies on this map: 1) Agents first move to the edges (Role 0) to avoid being surrounded by the enemies, and 2) Agents alternate between attacking and retracting (Role 2) to attract and kill part of the enemies, gaining an advantage in numbers.