Ablation study regarding L_I (the versatility and identifiability loss) and L_D (the specialization loss). Win rates at the end of training are shown, averaged over 8 random seeds.
Both LI and LD contribute to the significant outperformance, but the behavior specialization loss is more critical.
27 Marines (left, controlled by ROMA) vs. 30 Marines.
6 Stalkers & 4 Zealots (left, controlled by ROMA) vs. 10 Banelings & 30 Zerglings.
8 Marines (left, controlled by ROMA) vs. 9 Marines.
1 Medivac, 2 Marauders, and 7 Marines (left, controlled by ROMA) vs. 1 Medivac, 3 Marauders, and 8 Marines.
6 Zerglings & 4 Banelings (lower left and upper right, controlled by ROMA) vs. 6 Zerglings & 4 Banelings.
10 Zerglings & 5 Banelings (lower, controlled by ROMA) vs. 2 Stalkers & 3 Zealots.
2 Stalkers & 3 Zealots (left, controlled by ROMA) vs. 2 Stalkers & 3 Zealots.
10 Marines (left, controlled by ROMA) vs. 11 Marines.
5 Marines (left, controlled by ROMA) vs. 6 Marines.
Role emergence and evolution on the map 10m_vs_11m (means of the role distributions at the first time step are shown, without using any dimensionality reduction techniques) during training. At this time step, ROMA learns to allocate roles according to agents' relative positions so that agents can form the offensive arc quickly using specialized policies.
Role emergence and evolution on the map MMM2 (means of the role distributions at the first time step are shown, without using any dimensionality reduction techniques) during training. At this time step, ROMA learns to allocate roles according to agents' types.