Ammar Fayad & Majd Ibrahim
Our agents wear yellow clothes
GRF Full Game:
GFootball Full game after 72M steps of Training
1) Super Hard Counter Attack:
2) Run to Score with Keeper:
3) 3 vs 1 with Keeper:
4) Corner:
6h_vs_8z:
Corridor:
3s5z_vs_3s6z:
MMM2:
27m_vs_30m:
All experiments were performed on a high performance computing system with a SLURM \citep{yoo2003slurm} job scheduler. The compute nodes used each has two NVIDIA Volta V100 GPUs, and a dual socket Intel Xeon Gold 6248 processor with 20 cores each socket.
In this paper, we base our algorithm on QMIX \citep{rashid2018qmix}. Each agent has a neural network to approximate its local utility. The local utility network consists of three layers, a fully-connected layer, followed by a 64 bit GRU, and followed by another fully-connected layer that also takes the individual strategy as an input and outputs
an estimated local value for each action. This output is added to the output of a similar shared network, but with one difference: that is the shared network takes all the relational strategies as input to its second MLP. The utilities are fed into a mixing network estimating the global action value. The mixing network has a 32-dimensional hidden layer with ReLU activation. Parameters of the mixing network are generated by a hyper-net conditioning on the global state. This hyper-net has a fully-connected hidden layer of 32 dimensions. These settings are the same as QMIX.
The optimization is conducted using
RMSprop with a learning rate of $5 \times 10^{-4}$, $\alpha$ of $0.99$, and with no momentum or weight decay. We run 8 parallel environments to collect samples. Batches of $32$ episodes are sampled from the replay buffer, and the whole framework is trained end-to-end on fully unrolled episodes. Neural networks were implemented using PyTorch framework v1.8.1+cu102, and
graph neural networks include graph attention neural networks and graph convolution networks are
defined using DGL \citep{wang2019deep} v0.6.1.