Project Detail

ST-MADDPG

In this work, we select the implicit map regularization hyperparameter $\lambda$ via a grid search for the first and second environments. In general, the neural network may be highly non-convex and the hessian inverse can be ill-conditioned. A larger regularization prevents the gradient from exploding and yields smoother learning dynamics, as observed in our experiments as well as in other Stackelberg learning applications~\citep{fiez2020implicit, zheng2021stackelberg}. How to trade-off between Stackelberg and normal gradient learning by picking the regularization optimally or even adaptively is a future direction.

Computational Complexity: