SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Visualization on GRF

We visualize the Z=20 learned joint policies in Google Research Football (GRF). During this procedure, the agent-1, agent-2 and agent-3 are controlled by the learned policies and the opponents are built-in AI. We plot the GIFs of 10 episodes here for an overall view of the diversity between policies and the stability within the single policy itself. And the videos with original resolution have been uploaded on Youtube.

All the joint policies are learned by SPD in one URL training process without any external reward from the environment.

As we can see, some synergy patterns (SP) seem to be useless. For instance, in SP-1, the agent with ball is tackled by the opponent all the time and the agent tries to tackle the ball from the teammate in SP-2.

In the meanwhile, the other policies show meaningful skills, e.g., the agents learn the skill of adhesive control (SP-3, SP-4, and so on), crisscross (SP-5), running off the ball to maintain the formation with teammates (SP-9), (even) body swerve (SP-12) and so on. Such results verify the hypothesis that useful skills tend to be learned when these policies keep a high discrepancy to the meaningless skills. Also, the results demonstrate the effectiveness of the discrepancy of synergy patterns we proposed in our method.