Visualization on GRF
We visualize the Z=20 learned joint policies in Google Research Football (GRF). During this procedure, the agent-1, agent-2 and agent-3 are controlled by the learned policies and the opponents are built-in AI. We plot the GIFs of 10 episodes here for an overall view of the diversity between policies and the stability within the single policy itself. And the videos with original resolution have been uploaded on Youtube.
All the joint policies are learned by SPD in one URL training process without any external reward from the environment.
As we can see, some synergy patterns (SP) seem to be useless. For instance, in SP-1, the agent with ball is tackled by the opponent all the time and the agent tries to tackle the ball from the teammate in SP-2.
In the meanwhile, the other policies show meaningful skills, e.g., the agents learn the skill of adhesive control (SP-3, SP-4, and so on), crisscross (SP-5), running off the ball to maintain the formation with teammates (SP-9), (even) body swerve (SP-12) and so on. Such results verify the hypothesis that useful skills tend to be learned when these policies keep a high discrepancy to the meaningless skills. Also, the results demonstrate the effectiveness of the discrepancy of synergy patterns we proposed in our method.
Synergy Pattern 1
Synergy Pattern 2
Synergy Pattern 3
Synergy Pattern 4
Synergy Pattern 5
Synergy Pattern 6
Synergy Pattern 7
Synergy Pattern 8
Synergy Pattern 9
Synergy Pattern 10
Synergy Pattern 11
Synergy Pattern 12
Synergy Pattern 13
Synergy Pattern 14
Synergy Pattern 15
Synergy Pattern 16
Synergy Pattern 17
Synergy Pattern 18
Synergy Pattern 19
Synergy Pattern 20