Action Semantics Network

Considering the Effects of Actions in Multiagent Systems

ASN Structure

ASN:

We propose a novel network structure, named Action Semantics Network (ASN) to characterize such action semantics for more efficient multiagent coordination. Specifically, each agent's action set can be naturally divided into two types: one type containing actions that affect environmental information or its private properties and the other type containing actions that directly influence other agents (i.e., their private properties). Intuitively, the estimation of performing actions with different types should be evaluated separately by explicitly considering different information. We refer to the property that different actions may have different impacts on other agents as action semantics. We can leverage the action semantics information to improve an agent's policy/Q network design toward more efficient multiagent learning. Therefore, instead of inputting an agent's total observation into one network, ASN consists of several sub-modules that take different parts of the agent's observation as input according to the semantics of actions. In this way, ASN can effectively avoid the negative influence of those irrelevant information, and thus provide a more accurate estimation of performing each action. Besides, ASN is general and can be incorporated into existing deep MARL frameworks to improve the performance of existing DRL algorithms.

StarCraft II---SMAC

The cyan and red circles respectively border the sight and shooting range of the agent.

The full games of StarCraft: BroodWar and StarCraft II have been used as RL environments for some time, due to the many interesting challenges inherent to the games. DeepMind’s AlphaStar has recently shown a very impressive level of play on one StarCraft II matchup using a centralised controller. In contrast, SMAC is not intended as an environment to train agents for use in full StarCraft II gameplay. Instead, by introducing strict decentralisation and local partial observability, we use the StarCraft II game engine to build a new set of rich multi-agent problems.

Case Study: 15 Marines VS 15 Marines

A StarCraft II 15m map contains two groups, each of which includes 15 marines.  At each step, each agent observes the local game state and selects one of the following actions: move north, south, east or west, attack one of its enemies, stop and the null action. Agents belonging to the same side receive the same joint reward at each time step that equals to the total damage on the enemy units. Agents also receive a joint reward of 10 points after killing each opponent, and 200 points after killing all opponents. The game ends when all agents on one side die or the time exceeds a fixed period. 

Why ?

ASN-QMIX quickly learns the average win rates of approximately 80%, while vanilla-QMIX fails, with the average win rates of approximately only 20%. Intuitively, ASN enables an agent to explicitly consider more numbers of other agents' information with a larger agent space. However, for an agent using the vanilla network, it is more difficult to identify the action influence on other agents from a larger amount of mixed information, which results in lower average win rates than ASN.

An interesting observation for vanilla-QMIX is that they will run away to avoid all being killed, which can be seen as achieving a suboptimal solution.

Neural MMO

The Neural MMO (also in github) is a massively multiagent environment that defines combat systems for a large number of agents.

A simple Neural MMO scene with two groups of agents on a 10×10 tile. Each group contains 3 agents, each of which starts at any of the tiles, with HP=100. At each step, each agent loses one unit of HP, observes local game state (detailed in appendix) and decides on an action, i.e., moves one tile (up, right, left, down and stop) or makes an attack using any of three attack options: 'Melee' with the attack distance is 2, the amount of damage is 5; `Range' with the attack distance is 4, the amount of damage is 2; `Mage' with the attack distance is 10, the amount of damage is 1). Each action that causes an invalid effect (e.g., attack an opponent that beyond the attack range and move beyond the grid border) would make the agent standstill. Each agent gets a penalty of -0.1 if the attack fails. The game ends when all agents in one group die, and agents belonging to the same group receive a joint reward, which is the difference of the total HPs between itself and its opposite side.

ACKTR-Learning Curve

MMO_Vanilla

MMO_Attention

MMO_Entity_Attention

MMO_ASN_M

MMO_ASN_M1

When the distance between two agents is less than or equal to 2, the best attack option is "Melee" since it causes the maximum damage among three attacks. Similarly, "Range" is best when the distance range is larger than 4 and less than 10. "Mage" is best when the distance range is larger than or equal to 10.Both ASN-M1 (share the first neural network layer across multiple actions on one of enemies) and ASN-M (does not share the first neural network layer) cause higher total damage than other methods, and ASN-M1 agent causes the highest total damage on average. This indicates that ASN effectively extract the action semantics between agents, having a larger probability to select the best attach option (i.e., an attack that causes the most damage) with the distance between the agent and its opponent changing and cause the highest damage on average among all kinds of methods.