In this tutorial we will show how to train GFootball agents using Scalable, Efficient Deep-RL (SEED RL) - a scalable reinforcement learning agent that allows training up to million of frames per second. An open-source implementation can be found in this repository. It will be shown how to run an experiment on AI Platform as well as how to modify the SEED repository in order to allow a multi-agent setup.
apt-get install git - install git
Having docker installed is necessary. If you don't have it installed follow these steps:
usermod -aG docker $USERGcloud is used to upload jobs to AI platform. You can install and configure it using these commands:
gcloud auth login - authenticate with your accountgcloud config set project <project-name> - configure the project namegcloud auth configure-docker - allows to use docker with gcloudClone the SEED RL repo:
git clone https://github.com/google-research/seed_rl.gitcd seed_rlNow it's time to actually run an experiment using AI Platform:
Note that running the below command will create numerous machines on the Google Cloud Platform. In this configuration it will incur a cost of about 170 USD in GCP money per one day of training (as of March 2020). Details about the machines assigned to particular jobs can be found here: https://console.cloud.google.com/ai-platform/jobs.
gcp/train_football_checkpoints.sh - starts training with single agent setup on SMM observations.
The above script first builds a docker image that contains code for both the learner (master) and the workers. Briefly, workers execute multiple environment instances and send observations along with rewards to the learner, which performs inference and sends back the actions. The below chart shows an overview of the architecture. More details on the algorithm can be found in the SEED paper.
Source: SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
The image is then pushed to your GCP registry. After that the script prepares configurations for running experiments on AI Platform. It specifies types of machines and their number as well as hyper parameters like learning rate or entropy cost.
By default the training starts with 25 CPU-only workers for running environments and one learner with a GPU.
If you wish to change e.g. number of workers used in the experiment you can modify the script (seed_rl/gcp/train_football_checkpoints.sh#L24). Of course other parameters can be changed as well.
export CONFIG=footballexport ENVIRONMENT=footballexport AGENT=vtraceexport WORKERS=50export ACTORS_PER_WORKER=8The first time you run the experiment, docker image must be built which can take some time. Next runs use already created docker layers. You can check if your job was successfully uploaded by logging into the AI Platform console (the job is named SEED_<timestamp>). Setting up machines might take a few minutes. After that time in your bucket there should be the data coming from the experiment. The Google Cloud Platform bucket is named seed_rl by default.
GFootball offers several scenarios. By default SEED trains on full, 11 vs 11, game against difficult bots (11_vs_11_hard_stochastic). GFootball gives us also several academy scenarios which pose challenges of increasing difficulty.
For example, in academy_3_vs_1_with_keeper scenario we control 3 players which start near the opponent's goal, guarded by a goalkeeper and one defender.
To change scenario you have to change game parameter in seed_rl/gcp/train_football_scoring.sh#L45:
- parameterName: game type: CATEGORICAL categoricalValues: - academy_3_vs_1_with_keeperYou can save the changed file as train_football_3vs1_with_keeper.sh and use it to start a training.
The code in SEED repository is by default not suitable for multi-agent setup. If you wish to use a multi-agent setup you need to modify the SEED repo.
Example modifications: https://github.com/Zuuja/seed_rl/tree/multiagent
Below list highlights code that you need to look at to get multi-agent setup running.
number_of_players_agent_controls argument to gym.make called here: seed_rl/football/env.pyTo use the default network you need to provide it with exactly one observation and one scalar reward. GFootball environment outputs separate observation and reward for each of the controlled players. You can join them by creating wrappers like these:
class SampleMultiAgentRewardWrapper(gym.RewardWrapper): def __init__(self, env): super(SampleMultiAgentRewardWrapper, self).__init__(env) def reward(self, reward): return numpy.max(reward)# Beware that this wrapper is probably not the best (information# about players and ball is duplicated)class SampleMultiAgentObservationWrapper(gym.ObservationWrapper): def __init__(self, env): super(SampleMultiAgentObservationWrapper, self).__init__(env) observation_shape = env.observation_space.shape[1:-1] +\ (env.observation_space.shape[0] *\ env.observation_space.shape[-1], ) self.observation_space = gym.spaces.Box( low=0, high=255, shape=observation_shape, dtype=numpy.uint8) def observation(self, observation): return numpy.concatenate(observation, axis=-1)You need to apply the wrappers when the environment is created (in seed_rl/football/env.py).
flags.DEFINE_integer('controlled_agents', 1, 'Number of controlled left agents')...def create_environment(_): """Returns a gym Football environment.""" logging.info('Creating environment: %s', FLAGS.game) assert FLAGS.num_action_repeats == 1, 'Only action repeat of 1 is supported.' channel_dimensions = { 'default': (96, 72), 'medium': (120, 90), 'large': (144, 108), }[FLAGS.smm_size] env = gym.make( 'gfootball:GFootball-%s-SMM-v0' % FLAGS.game, stacked=True, rewards=FLAGS.reward_experiment, channel_dimensions=channel_dimensions, number_of_left_players_agent_controls= FLAGS.controlled_agents) # Beware that football network expects one scalar reward # and observation of shape [X,Y,L] env = SampleMultiAgentRewardWrapper(env) env = SampleMultiAgentObservationWrapper(env) # Beware that PackedBitsObservation expects that observation # consist of 255 and 0 return observation.PackedBitsObservation(env)Above we added the flag for the number of controlled players. You need to add it to the starting script (for example gcp/train_football_3vs1_with_keeper.sh). To control 3 players you need to add:
- parameterName: game type: CATEGORICAL categoricalValues: - academy_3_vs_1_with_keeper - parameterName: controlled_agents type: INTEGER minValue: 3 maxValue: 3You can run an experiment in the same manner as it was run in the single-agent setup i.e. via gcp/train_football_3vs1_with_keeper.sh script created above.
The described modifications are present here: https://github.com/Zuuja/seed_rl/tree/multiagent
Screenshot from tensorboard after 10,5 hours of training on 3vs1 with keeper scenario:
GFootball environment can provide plenty of information about the current game state e.g. position, velocity, tiredness factor and current action of each player, position, velocity, and rotation of the ball and more. The environment provides three different representations: super mini-map (SMM), simple115 (floats), and rendered game frames (pixels). Each way of encoding the data has its pros and cons e.g. one can take a little space whilst another can provide faster learning rate.
By default SEED uses the SMM representation where observations consist of several 72 by 96 planes of byte data. Each plane focuses on different aspects of the game e.g. position of right/left team or position of the ball.
Another way of representing the game state is simple115 a.k.a. floats. In this representation the whole game state is encoded in 115 floats. More detailed information about the possible game representations can be found in the GFootball documentation: gfootball/doc/observation.md.
To change the representation you need to modify the same locations as when changing to multi-agent setup - network definition and creation, env creation, and packed bits observation wrapper.
For example, if you want to adopt the float representation instead of mini-map, you can use other network (for example seed_rl/agents/vtrace/networks.py). Since floats don’t need compression you should remove the PackedBitsObservation wrapper (at seed_rl/football/env.py#L49):
return observation.PackedBitsObservation(env)You also need to change the gym make command to change the observation format (seed_rl/football/env.py#L45).
For example to use simple115 observation (the floats observation) change this line:
'gfootball:GFootball-%s-SMM-v0' % FLAGS.game,
to:
'gfootball:GFootball-%s-simple115-v0' % FLAGS.game,The example of needed modifications: https://github.com/Zuuja/seed_rl/tree/floats
Screenshot from tensorboard after 9 hours of training on 3vs1 with keeper scenario:
You can also run seed locally to, for example, test your changes before starting training on cloud. This section assumes that you are using:
a) You can check tensorflow requirements here
b) You can check your graphic card compute capability here
To run SEED locally you need to:
It is possible to view experiment results via tensorboard - a machine learning experiments visualisation tool:
gcloud auth application-default login - authenticate so that tensorboard have access to your gcloud buckettensorboard --logdir=gs://<bucket_name> - this command starts the tensorboard. It can be accessed from the browser at http://localhost:6006/ (6006 is the default port). The default bucket name used by SEED is seed_rl. Another option is to use tensorboard.dev, which allows users to upload data to the cloud so others can see the results online.
unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authenticationIf you run the script using sudo (i.e. as a different user) it is possible that you receive the error above. Running without sudo might help.
The gcloud compute regions list command lists the accessible regions alongside the available resources.
If you don't want your bucket to be named seed_rl (default name) you can modify the seed_rl/gcp/setup.sh file. You need to change every appearance of text seed_rl to your bucket name.
seed_rl/gcp/setup.sh: line 23: seed_rl/gcp/../docker/push.sh: Permission deniedWhen running an experiment the error above can appear which means the seed_rl/docker/push.sh script might not have execution rights. chmod +x docker/push.sh should help.
We observed that on some machines there are problems with docker accessing gcloud. This command might be helpful in such scenarios gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://gcr.io