Meta-MAPG

A Policy Gradient Algorithm for Learning to Learn

in Multiagent Reinforcement Learning

2-Agent HalfCheetah benchmark (de Witt et al., 2020)

Two agents are coupled within the robot and control the robot together: the red and blue agent control three joints of the back and front leg, respectively.

Meta-MAPG

Meta-PG

LOLA-DiCE

REINFORCE

Summary:

Unlike mixed incensive and competitive settings, influencing peer learning does not help much in cooperative settings and Meta-MAPG performs similarly to Meta-PG.
Second, Meta-PG and Meta-MAPG outperform the other approaches of LOLA-DiCE and REINFORCE, achieving higher rewards when interacting with a new teammate.

Page updated

Google Sites

Report abuse