In the following experiment we have trained via self-play a neural network with 11 “heads” that separately controls both of the teams. Each “head” outputs an action for each player in the team. The neural network learned to take advantage of the offside mechanic. Agents run towards the enemy’s side of the pitch to create an offside trap.
In the text the neural network will be referred to by "agent" (a whole entity that controls the team) whereas a single player from such a team will be referred to by "player".
The gif shows how our agent behaved in the game. Green markers represents players from one team whereas yellow ones from another. Both of the teams were controlled by the same neural network. In the top left corner there is information about the game state e.g. current frame or action undertaken by the network.
The agent learned to take advantage of the offside strategy; its players run towards the opposite side of the pitch to create an offside trap.
The gif shows what happens when both teams are trying an "offside trap". Both of them are controlled by the same 11-heads network. An extreme case is presented here: all the players run towards the opposite side of the pitch so that any try of passing the ball by the enemy team causes an offside. We called this strategy "phalanx".
Because of the learned “phalanx” strategy the agent became afraid of touching the ball. Often after recapture there is an offside, so the agent behaves as touching the ball is a bad strategy that leads to enemy team gaining control of it.
Sometimes an agent doesn’t intercept the ball when they are on the opponent's side of the pitch because they think it will lead to offside. This phenomenon may be caused by the fact that neural network remembers only the last 4 frames of the game. After four frames the agent doesn't know whether they were in an offside position when the pass happened. Otherwise the agent could have taken the ball and score.
We believe that relying on offsides is an immature strategy which will be surpassed by more refined behaviors given more training time and using a better learning algorithm.
The environment allows to setup a game with custom number of players e.g. 5v5. Our next step is to train such a network and let it compete with others at https://research-football.dev/multiagent/