RL Theoretical Foundation

[2] L. Liang*, H. Yang. On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization. [pdf]

[1] S. Han, S. Su, S. He, S. Han, H. Yang, S. Zou, F. Miao*. What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning? Transaction on Machine Learning Research, 2024 [pdf] [doi]