Action Semantics Network

Considering the Effects of Actions in Multiagent Systems

ASN Structure

ASN:

We propose a novel network structure, named Action Relation Network (ASN) to characterize such action semantics for more efficient multiagent coordination. Specifically, each agent's action set can be naturally classified into two types: one type contains actions that affect environmental information or its private properties and the other type of actions directly influence other agents. Therefore, if an agent's action is to attack (or communicate with) one of other agents, the value of performing this action should be explicitly dependent on the agent's perception of its environments and the agent to be attacked (or communicated with), and any additional information is irreverent and may add noise. We refer to the property that different actions may have different impacts on other agents and should be evaluated differently as action semantics between agents.

As we analyzed, the value estimation of actions of different semantics can be improved by relying on only those relevant information. Instead of mixing all agent's information together and then inputting into the network, ASN separates the agent's information according to the action semantics. In this way, ASN can provides a more accurate estimation of each action's value and significantly improve the performance of existing DRL algorithms.

StarCraft II---SMAC

The cyan and red circles respectively border the sight and shooting range of the agent.

The full games of StarCraft: BroodWar and StarCraft II have been used as RL environments for some time, due to the many interesting challenges inherent to the games. DeepMind’s AlphaStar has recently shown a very impressive level of play on one StarCraft II matchup using a centralised controller. In contrast, SMAC is not intended as an environment to train agents for use in full StarCraft II gameplay. Instead, by introducing strict decentralisation and local partial observability, we use the StarCraft II game engine to build a new set of rich multi-agent problems.


Case Study: 15 Marines VS 15 Marines

A StarCraft II 15m map contains two groups, each of which includes 15 marines. At each step, each agent observes the local game state and selects one of the following actions: move north, south, east or west, attack one of its enemies, stop and the null action. Agents belonging to the same side receive the same joint reward at each time step that equals to the total damage on the enemy units. Agents also receive a joint reward of 10 points after killing each opponent, and 200 points after killing all opponents. The game ends when all agents on one side die or the time exceeds a fixed period.

Learning Curve

Why ?

ASN-QMIX quickly learns the average win rates of approximately 80%, while vanilla-QMIX fails, with the average win rates of approximately only 20%. Intuitively, ASN enables an agent to explicitly consider more numbers of other agents' information with a larger agent space. However, for an agent using the vanilla network, it is more difficult to identify the action influence on other agents from a larger amount of mixed information, which results in lower average win rates than ARN.

An interesting observation for vanilla-QMIX is that they will run away to avoid all being killed, which can be seen as achieving a suboptimal solution.

Neural MMO

The Neural MMO (also in github) is a massively multiagent environment that defines combat systems for a large number of agents.

A simple Neural MMO scene with two groups of agents on a 10×10 tile. Each group contains 3 agents, each of which starts at any of the tiles, with 100 drops of blood. At each step, each agent loses one drop of blood, observes local game state and decides on an action, i.e., moves one tile (up, right, left, down and stop) or makes an attack using any of three attack options: "Melee" with the attack distance is 2, the amount of damage is 5; "Range" with the attack distance is 4, the amount of damage is 2; "Mage" with the attack distance is 10, the amount of damage is 1). Each action that causes an invalid effect (e.g., attack an opponent that beyond the attack range and move beyond the grid border) would make the agent standstill. Each agent gets a penalty of -0.1 if the attack fails. The game ends when all agents in one group die or the time exceeds a fixed period, and agents belonging to the same group receive a joint reward, which is the difference of the total blood volumes between itself and its opposite side.

ACKTR-Learning Curve

MMO_PPO_Vanilla

MMO_PPO_Attention

MMO_PPO_ASN

When the distance between two agents is less than or equal to 2, the best attack option is "Melee" since it causes the maximum damage among three attacks. Similarly, "Range" is best when the distance range is larger than 4 and less than 10; "Mage" is best when the distance range is larger than or equal to 10. ASN always has a larger probability to select the best attach option (i.e., an attack that causes the most damage) with the distance between the agent and its opponent changing and cause the highest damage on average among all kinds of methods.