Getting Started with Pyquaticus for MCTF
This page gives on overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check the Submit Your Entry page.
Training Agents to Play MCTF
A sample code for training two agents to play MCTF as a team is provided inside Pyqauticus' rl_test/competition_train_example.py.It uses the multi-agent reinforcement learning (MARL) library called RLlib. If you are unfamiliar with multiagent training through RLlib, we recommend reading the RLlib documentation here.
Before running the program, make sure your virtual environment is activated.
Run 'python competition_train_example.py'
The above training script trains the models or policies for the two agents being trained and saves both models as checkpoint files into a folder named ray_tests/ in the same folder where the training script is run.
The saved policies are located in the file: ray_tests/<checkpoint_num>/policies/<policy-name>. The frequency at which policies are saved is defined in competition_train_example.py line:112. More information about the policy_name is given in the 'Policy Mapping to Agent Ids' section below.
Policy Mapping to Agent Ids
Below is a code snippet from the rl_test/competition_train_example.py file. Starting on line: 1 below (line: 86 in the code repository) we first define a dictionary for mapping between policy names and , and, then we define the policy mapping function. This policy mapping function is used by the RLlib training algorithm to ensure that each agent is being correctly mapped to a learning policy or the intended static policy during the training phase.
Training Algorithm: Rollout Workers and GPUs
We give below an example of a PPO algorithm to train our learning policies established in the 'Policy Mapping to Agent Ids' section above. The code snippet is from the competition_train_example.py file.
You could modify line: 3 above (line: 102 in the repository) to adapt to your available computing resources.
On line: 7 above (line: 140 in the repository) , you could modify the names of the policies you are training based on the policies you set up in the 'Policy Mapping to Agent Ids' section above.
Reward Function Design
A critical component to training successful agents is the design of the reward function. This is particularly important in multiagent scenarios where we have to assign rewards to two agents who might be cooperatively completing tasks. To assist with this we have provided a few examples in the rewards.py file found in the utils folder (pyquaticus/envs/utils/rewards.py). Below is the complete list of reward function parameters along with descriptions:
You can also take a look at parameters params and prev_params that get passed into the reward function. These two variable are used to determine the game states and assign rewards to your agents during the training period. Below we have included an example of a sparse reward function:
Useful Configuration Parameters
Changing the values in the configuration file can help speed up training; below we have a list of configuration values along with a brief description.
world_size: [160.0, 80.0]
The size of the game field (recommended not to change this)
type: list (of floats x, y)
pixel_size: 10
Pixels/meter
type: int
agent_radius: 2.0
Size of agent's radius in meters
type: float
flag_keepout: 5.0
Minimum distance, in meters, an agent can get to the flag centers without being repelled automatically.
type: float
max_speed: 1.5
Max speed an agent can travel (Competition Speed is 1.5m/s)
type: float
own_side_accel: 1.0
Percentage of max acceleration that can be used on your side of scrimmage (Recommended not to change this value)
type: float [0.0, 1.0]
opp_side_accel: 1.0
Percentage of max acceleration that can be used on your side of scrimmage [0.0,1.0] (Recommended not to change this value)
type: float
wall_bounce: 0.5
Percentage of current speed (x or y) at which an agent is repelled from a wall (vertical or horizontal)
type: float
tau
Max dt (seconds) for updating the simulation (1/10) will be used for the competition
type: float
sim_speedup_factor: 1
Simulation speed multiplier
type: int (>= 1)
max_time: 240.0
Maximum time (seconds) per episode
type: float
max_score: 1
Maximum score per episode (until a winner is declared)
type: int
max_screen_size: (x, y)
Screen size pixels width by pixels height
type: list (int)
random_init: False
Randomly initialize agents' positions for ctf mode (within fairness constraints)
type: boolean
save_traj: False
Save trajectory as a pickle (useful for behavior cloning or imitation learning)
type: boolean
render_fps: 30
Frames per second when rendering with 'human' passed in
type: int
normalize: True
Flag for normalizing the observation space
type: boolean
tagging_cooldown: 10.0
Time (seconds) cooldown before an agent is able to tag again
type: float
speed_factor: 20.0
Multiplicative factor for desired_speed -> desired thrust (Will be set to 20.0 for competition)
type: float
thrust_map: [[-100, 0, 20, 40, 60, 80, 100], [-2, 0, 1, 2, 3, 5, 5]]
Piecewise linear mapping from desired_thrust to speed
type: list
max_thrust: 70
Limit on vehicle thrust
type: int
competition setting: 70.0
max_rudder: 100
Limit on vehicle rudder actuation
type: int
competition setting: 100
turn_loss: 0.85
type: float
competition setting: 0.85
max_acc: 1
Maximum acceleration (m / S**2)
type: float
competition setting: 1
max_dec: 1
Maximum deceleration (m / S**2)
type: float
competition setting: 1
suppress_numpy_warnings: True
Option to stop numpy from printing warnings to the console
type: boolean
competition setting: True
teleport_on_tag: False
Option to stop numpy from printing warnings to the console
type: boolean
competition setting: True
teleport_on_tag: False
Option for the agent when tagged, either out of bounds or by opponent, to teleport home or not (Setting this to True when training can speed up training times)
type: boolean
competition setting: False