Getting Started with Pyquaticus for MCTF

This page gives on overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check the Submit Your Entry page.

Training Agents to Play MCTF

A sample code for training two agents to play MCTF as a team is provided inside Pyqauticus' rl_test/competition_train_example.py.It uses the multi-agent reinforcement learning (MARL) library called RLlib. If you are unfamiliar with multiagent training through RLlib, we recommend reading the RLlib documentation here.

Policy Mapping to Agent Ids

Below is a code snippet from the rl_test/competition_train_example.py file. Starting on line: 1 below (line: 86 in the code repository) we first define a dictionary for mapping between policy names and , and, then we define the policy mapping function. This policy mapping function is used by the RLlib training algorithm to ensure that each agent is being correctly mapped to a learning policy or the intended static policy during the training phase. 

Training Algorithm: Rollout Workers and GPUs

We give below an example of a PPO algorithm to train our learning policies established in the 'Policy Mapping to Agent Ids' section above. The code snippet is from the competition_train_example.py file.  

You could modify line: 3 above (line: 102 in the repository) to adapt to your available computing resources.

On line: 7 above (line: 140 in the repository) , you could modify the names of the policies you are training based on the policies you set up in the 'Policy Mapping to Agent Ids' section above.

Reward Function Design

A critical component to training successful agents is the design of the reward function. This is particularly important in multiagent scenarios where we have to assign rewards to two agents who might be cooperatively completing tasks. To assist with this we have provided a few examples in the rewards.py file found in the utils folder (pyquaticus/envs/utils/rewards.py). Below is the complete list of reward function parameters along with descriptions:

You can also take a look at parameters params and prev_params that get passed into the reward function. These two variable are used to determine the game states and assign rewards to your agents during the training period. Below we have included an example of a sparse reward function:

Useful Configuration Parameters

Changing the values in the configuration file can help speed up training; below we have a list of configuration values along with a brief description.