Trust region based policy optimization methods exploit the local Riemannian geometry of the parameter space to derive more efficient policy updates. Trust region competitive policy optimization (TRCoPO), the CoPO generalization of TRPO, exploits the local geometry of the competitive objective to derive more efficient parameter updates.
TRCoPO optimizes a surrogate game objective in a trust region. TRCoPO updates agents’ parameters simultaneously by deriving the Nash equilibrium of bilinear (in contrast to linear approximation in off-the-shelf trust region methods) approximation to the surrogate objective within a defined trust region in the parameter space. Lets derive this trust region and agent's game objective.