Curriculum

In the GFootball environment the base reward is called scoring and gives +1 for scoring and -1 for losing a goal. There is an additional built-in reward called checkpoints which gives partial points for the approaching opponents goal with a ball. Using solely scoring makes it really hard to train because of the sparse reward - the agent rarely receives a reward. While we were able to easily train agents to beat up built-in AI using checkpoints, without it we couldn’t even beat random players. In this blogpost we want to show one possible way to train an agent using only scoring reward.

Our idea was to create a curriculum of scenarios with increasing difficulty. The mean distance of players controlled by us is increasing in each scenario. The opponent was controlled by built-in AI. Its difficulty can be parametrized by a number from 0 to 1, where 0 is the easiest and 1 is the hardest difficulty. It was randomized uniformly on [0;1] each game.

All the scenarios were 5 vs 5, so in each we controlled 4 players (excluding goalkeeper). For each scenario there is a scheme of an example starting position.

Each game lasted 400 frames at most, where 10 frames in the game is equivalent to 1 second of real time. The game was interrupted on scoring a goal, possession change or ball getting out of pitch.

At first players starting positions were not randomized but it led to overfitting - the agent could play only on these specific scenarios. If it was to play against a built-in AI, it would perform poorly. That is why each player position is randomly selected from either uniform distribution or gaussian distribution.

The name of a scenario often consists of information how many players actively participated in the action and a mean distance from the half of the pitch to the opponent goal.

These are the scenarios included in the curriculum:

4v1 (100M steps)

Four of our players are next to the goal of the opponent. There is one defender on the opponent side. At the beginning of the scenario the ball is randomly spawned next to one of our players. The three other opponent players are so far away that they can not have impact on the play.

3v3_pass_and_shoot (26M steps)

The three of our players are next to the goal of the opponent. Two of them are covered by three opposing players and one of them possess the ball. The third one is in a great position to shoot but in order to succeed other two players must pass him the ball. One player from each team is so far away that it can not impact the play.

4v2_0_70 (24M steps)

Four of our players are relatively close to the opponent's goal. There are two defenders on the opposite side. Ball is spawned randomly next to one of our players. The remaining two opponent players are so far away that it can not impact the play.

4v3_0_65 (70M steps)

Four of our players are relatively close to the opponent's goal. There are three defenders on the opposite side. Ball is spawned randomly next to one of our players. The remaining opponent player is so far away that it can not impact the play.

Our agent had struggles to score in almost every case because opponent players often spawned next to ours and they did not have much space to outplay them. This is why in the next scenario they spawn farther away from the goal.

4v3_0_40 (80M steps)

Very similar to the scenario above. The only difference is that players spawn farther away from the goal.

4v3_randomized_ball_0_30 (100M steps)

Players are again farther away from the goal. The position of the ball is randomized - it is no longer spawned next to one of our players, rather its position is generated from uniform distribution. Before the action begins the players must intercept the ball.

The agent could not overcome this scenario. The randomized ball position was too difficult for him as he has never seen such a scenario. In order to fix the problem of the random ball position the next few scenarios spawn the ball at a random position from uniform distribution.

4v1_randomized_ball (50M steps)

Same as the 4v1 scenario, but the ball starting position is randomized.

4v2_randomized_ball_0_70 (100M steps)

Same as the 4v2 scenario, but the ball starting position is randomized.

4v3_randomized_ball_0_30 (10M steps)

Four of our players are on the opponent’s side of the pitch. There are three defenders on the opposite side. Ball starting position is randomized. The remaining opponent player is so far away that it can not impact the play.

5vs5_easy_bots (1,3G steps)

Evaluation against easy build-in AI. Full match lasting 5 minutes. The agent wins by a huge margin approx. 13 goals.

5vs5_hard_bots (110M steps)

Evaluation against hard build-in AI. Full match lasting 5 minutes. The agent wins by a huge margin approx. 13 goals.

The training took ~2G steps in total. However, we believe that this number can be reduced as a subset of presented scenarios can be used in the training.

Of course there are plenty of other ways to achieve good training on scoring. We also tried another, simpler curriculum, where we changed the number of players our agent played against. It led to a successful training as well.

Page updated

Google Sites

Report abuse