In this work, we select the implicit map regularization hyperparameter $\lambda$ via a grid search for the first and second environments. In general, the neural network may be highly non-convex and the hessian inverse can be ill-conditioned. A larger regularization prevents the gradient from exploding and yields smoother learning dynamics, as observed in our experiments as well as in other Stackelberg learning applications~\citep{fiez2020implicit, zheng2021stackelberg}. How to trade-off between Stackelberg and normal gradient learning by picking the regularization optimally or even adaptively is a future direction.
Computational Complexity:
Competitive-Cartpoles:
Hopper:
Approximated Stackelberg Update:
Experimental Result:
Total Derivative Stackelberg Update V.S. Approximated Stackelberg Update:
The Fencing Game: