CoPO - TRCoPO

Trust Region Competitive Policy Optimization

Why a trust region approach?

Trust region based policy optimization methods exploit the local Riemannian geometry of the parameter space to derive more efficient policy updates. Trust region competitive policy optimization (TRCoPO), the CoPO generalization of TRPO, exploits the local geometry of the competitive objective to derive more efficient parameter updates.

TRCoPO optimizes a surrogate game objective in a trust region. TRCoPO updates agents’ parameters simultaneously by deriving the Nash equilibrium of bilinear (in contrast to linear approximation in off-the-shelf trust region methods) approximation to the surrogate objective within a defined trust region in the parameter space. Lets derive this trust region and agent's game objective.

Surrogate game objective

Trust region formation

Experiments

TRCoPO vs TRCoPO TRGDA vs TRGDA

Car Racing

Rock Paper Scissors

Soccer

Matching Pennies

Page updated

Google Sites

Report abuse