Competitive Policy Optimization

A novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates

This page contains experimental videos comparing,

Gradient Descent Ascent(GDA) with Competitive Policy Gradient (CoPG),

Trust Region Gradient Descent Ascent (TRGDA) with Trust Region Competitive Policy Optimization (TRCoPO).

Experiment: Car Racing







GAIL Case Study

Experiment: Rock Paper Scissors

GDA vs GDA CoPG vs CoPG

TRGDA vs TRGDA TRCoPO vs TRCoPO

Experiment: Markov Soccer

GDA vs GDA CoPG vs CoPG

TRGDA vs TRGDA TRCoPO vs TRCoPO

Experiment: Matching Pennies

GDA vs GDA CoPG vs CoPG

TRGDA vs TRGDA TRCoPO vs TRCoPO