To evaluate MA-Trace, we use StarCraft Multi-Agent Challenge (SMAC) - a standard benchmark for multi-agent algorithms. The challenge consists of 14 micromanagement tasks based on a popular real-time strategy game StarCraft II. Every task is a small arena in which two teams of units fight against each other, one controlled by the player and one by a built-in AI. The goal in every task is to defeat all the enemy units.
Every unit belongs to one of the three races: Protoss, Terran, or Zerg. Additionally, they are divided into several classes with unique characteristics, such as speed, shooting range, firepower, etc. In each step, they can move or attack an enemy in their shooting range. A unit is considered defeated if its health drops to 0. A defeated unit can no longer act in the game.
SMAC provides a variety of different tasks. In some easier tasks, opponents control the same forces. Therefore, it is enough to coordinate at least better than the built-in AI to win such a game. In the more challenging tasks, however, the computer starts with a stronger squad. This can be a minor advantage, such as in the task 10 marines vs 11 marines, or quite a big difference. In particularly hard scenarios, such as corridor, it is unreasonable for the player to engage in an open fight, so it is essential to develop a long-term strategy to obtain some advantage. For the hardest tasks, the authors of the challenge propose microtricks they consider sufficient to win consistently.
For evaluation, we use all the standard tasks available in version 4.10.
Each unit in the game has limited vision, which makes the environment partially observable. The observations received by individual agents contain information about all the visible units (including themselves) - their health, energy, position, class, and other relevant features. All the units beyond the sight range are marked as dead. Therefore the observations do not distinguish defeated from invisible units. It is possible that the aggregated observations do not provide full information, for there can be enemy units hidden beyond the sight range of any ally. Therefore, to facilitate centralized training, SMAC provides additional access to the full state of the environment. This is meant to be used only during training but not during decentralized execution. However, note that if the armies engage in a direct fight, most units see all the enemies, which reduces the need for using the full state.
Sight range (cyan circle) and shooting range (red circle) of a Marine unit.
Though the only goal is always to win the fight, learning from binary rewards is prohibitively hard. Therefore SMAC provides dense rewards to enable training. The team receives points for every damage dealt, an additional bonus for defeating a unit, plus an additional bonus for winning. This scheme is intended to be used by all the algorithms without any task-specific or algorithm-specific tuning. As one can see by the results, such a reward indeed leads to successful training. However, in special cases, it might reinforce suboptimal behaviors, see our discussion of 3s_vs_5z.