In the GFootball environment the base reward is called scoring and gives +1 for scoring and -1 for losing a goal. There is an additional built-in reward called checkpoints which gives partial points for the approaching opponents goal with a ball. Using solely scoring makes it really hard to train because of the sparse reward - the agent rarely receives a reward. While we were able to easily train agents to beat up built-in AI using checkpoints, without it we couldn’t even beat random players. In this blogpost we want to show one possible way to train an agent using only scoring reward.
Our idea was to create a curriculum of scenarios with increasing difficulty. The mean distance of players controlled by us is increasing in each scenario. The opponent was controlled by built-in AI. Its difficulty can be parametrized by a number from 0 to 1, where 0 is the easiest and 1 is the hardest difficulty. It was randomized uniformly on [0;1] each game.
All the scenarios were 5 vs 5, so in each we controlled 4 players (excluding goalkeeper). For each scenario there is a scheme of an example starting position.
Each game lasted 400 frames at most, where 10 frames in the game is equivalent to 1 second of real time. The game was interrupted on scoring a goal, possession change or ball getting out of pitch.
At first players starting positions were not randomized but it led to overfitting - the agent could play only on these specific scenarios. If it was to play against a built-in AI, it would perform poorly. That is why each player position is randomly selected from either uniform distribution or gaussian distribution.
The name of a scenario often consists of information how many players actively participated in the action and a mean distance from the half of the pitch to the opponent goal.
These are the scenarios included in the curriculum:
Four of our players are next to the goal of the opponent. There is one defender on the opponent side. At the beginning of the scenario the ball is randomly spawned next to one of our players. The three other opponent players are so far away that they can not have impact on the play.
The three of our players are next to the goal of the opponent. Two of them are covered by three opposing players and one of them possess the ball. The third one is in a great position to shoot but in order to succeed other two players must pass him the ball. One player from each team is so far away that it can not impact the play.
Four of our players are relatively close to the opponent's goal. There are two defenders on the opposite side. Ball is spawned randomly next to one of our players. The remaining two opponent players are so far away that it can not impact the play.
Four of our players are relatively close to the opponent's goal. There are three defenders on the opposite side. Ball is spawned randomly next to one of our players. The remaining opponent player is so far away that it can not impact the play.
Our agent had struggles to score in almost every case because opponent players often spawned next to ours and they did not have much space to outplay them. This is why in the next scenario they spawn farther away from the goal.
Very similar to the scenario above. The only difference is that players spawn farther away from the goal.
Players are again farther away from the goal. The position of the ball is randomized - it is no longer spawned next to one of our players, rather its position is generated from uniform distribution. Before the action begins the players must intercept the ball.
The agent could not overcome this scenario. The randomized ball position was too difficult for him as he has never seen such a scenario. In order to fix the problem of the random ball position the next few scenarios spawn the ball at a random position from uniform distribution.
Same as the 4v1 scenario, but the ball starting position is randomized.
Same as the 4v2 scenario, but the ball starting position is randomized.
Four of our players are on the opponent’s side of the pitch. There are three defenders on the opposite side. Ball starting position is randomized. The remaining opponent player is so far away that it can not impact the play.
Evaluation against easy build-in AI. Full match lasting 5 minutes. The agent wins by a huge margin approx. 13 goals.
Evaluation against hard build-in AI. Full match lasting 5 minutes. The agent wins by a huge margin approx. 13 goals.
The training took ~2G steps in total. However, we believe that this number can be reduced as a subset of presented scenarios can be used in the training.
Of course there are plenty of other ways to achieve good training on scoring. We also tried another, simpler curriculum, where we changed the number of players our agent played against. It led to a successful training as well.