Table 1: The win rates on unseen maps of the policy learned on corridor, where 6 ally Zealots face 24 Zerglings. We do not train policies on new maps.
Figure 1: Map corridor: 6 Zealots are faced with 24 Zerglings
A bonus of our method is that we can transfer the learned policies to tasks with new actions and (or) different numbers of agents. The learned policy can still win 50% of the games on unseen maps with three times the number of agents (shown left).
Video 1: Game replays on an unseen map where 18 ally Zealots face 36 Zerglings. The policy is trained on corridor, where 6 ally Zealots face 24 enemy Zerglings. We do not train policies on this unseen map.
Figure 2: Map corridor
6 Zealots face with 24 Zerglings.
Figure 3: Performance of RODE
Video 2: Dynamics of role selection learned by RODE. Role 0 and Role 2 motivate agents to explore the state space in a certain direction, which helps them learn several important strategies on this map: 1) Agents first move to the edges (Role 0) to avoid being surrounded by the enemies, and 2) Agents alternate between attacking and retracting (Role 2) to attract and kill part of the enemies, gaining an advantage in numbers.
Strategy learned by RODE.
Strategy learned by RODE.
Strategy learned by RODE.
Strategy learned by RODE.