The Football Project is a scientific collaboration between Google's research divisions, academic faculty and eight Computer Science students at the University of Warsaw. Students are working in two parallel teams. This collaborative effort is of eight undergraduate students working together on a joint project. We work on Google Research Football, a new reinforcement learning environment published recently by Google.
In June 2019, a team of researchers at Google revealed a new reinforcement learning environment: a football simulator, called Google Research Football. They briefly summarised their motivation in a blog post: “The game of football is particularly challenging for RL, as it requires a natural balance between short-term control, learned concepts, such as passing, and high level strategy”.
The Google Research Football environment is highly customizable: examples include different types of observations, regulated level of stochasticity, variable difficulty and even different “academy” scenarios. It integrates well with classical reinforcement learning frameworks like OpenAI Baselines, even allowing to train multi-agent strategies. The environment provides a great amount of freedom to researchers, and the authors have set an example. In this introductory paper, the authors included a few benchmarks and checks.
This research collaboration aims to develop the Google Research Football environment further and answer some previously unanswered questions about floating-point number representations and multi-agent learning. Students have been assigned to work on these problems: The Floats team is focusing on nonstandard observations which utilize floating-point numbers, reward shaping techniques, and nonstandard training tricks. The Multi-Agent team focuses on training multiple agents and exploring various curricula to create collaborative strategies with many independent agents playing the same match.
The Single-Agent team has been focusing on reward shaping in the game and improving training using an observation called Simple115 (and its modifications). This observation consists of the floating-point representation of absolute positions of players and ball. The team has also applied a number of improvements, including dynamic adaptation of game difficulty and self-play with opponent swapping, leading to surprising results.
This team's goal is to train a set of agents starting from random policy to a cooperating team. The mean to this is self-play with current and previous versions of the same policy. During this process our agents often find suboptimal strategies some of which are very interesting. We are also improving tools which often have no full support for the multi-agent setup.