Hierarchical Strategies for Cooperative MARL

Ammar Fayad & Majd Ibrahim

Hierarchical Strategies for Cooperative Multi-Agent Reinforcement Learning

Method Overview

Full Illustration

Baselines

Google Research Football Visualizations

Our agents wear yellow clothes

GRF Full Game:

GFootball Full game after 72M steps of Training

"GFootball Tasks"

1) Super Hard Counter Attack:

2) Run to Score with Keeper:

3) 3 vs 1 with Keeper:

4) Corner:

StarCraft II Visualizations

"Super Hard Scenarios"

6h_vs_8z:

Corridor:

3s5z_vs_3s6z:

MMM2:

27m_vs_30m:

Implementation Details / Reproducibility Statement

All experiments were performed on a high performance computing system with a SLURM \citep{yoo2003slurm} job scheduler. The compute nodes used each has two NVIDIA Volta V100 GPUs, and a dual socket Intel Xeon Gold 6248 processor with 20 cores each socket.

In this paper, we base our algorithm on QMIX \citep{rashid2018qmix}. Each agent has a neural network to approximate its local utility. The local utility network consists of three layers, a fully-connected layer, followed by a 64 bit GRU, and followed by another fully-connected layer that also takes the individual strategy as an input and outputs

an estimated local value for each action. This output is added to the output of a similar shared network, but with one difference: that is the shared network takes all the relational strategies as input to its second MLP. The utilities are fed into a mixing network estimating the global action value. The mixing network has a 32-dimensional hidden layer with ReLU activation. Parameters of the mixing network are generated by a hyper-net conditioning on the global state. This hyper-net has a fully-connected hidden layer of 32 dimensions. These settings are the same as QMIX.

The optimization is conducted using

RMSprop with a learning rate of $5 \times 10^{-4}$, $\alpha$ of $0.99$, and with no momentum or weight decay. We run 8 parallel environments to collect samples. Batches of $32$ episodes are sampled from the replay buffer, and the whole framework is trained end-to-end on fully unrolled episodes. Neural networks were implemented using PyTorch framework v1.8.1+cu102, and

graph neural networks include graph attention neural networks and graph convolution networks are

defined using DGL \citep{wang2019deep} v0.6.1.

Page updated

Google Sites

Report abuse