A Stochastic Approximation Method, H. Robbins and S. Monro, The Annals of Mathematical Statistics, (1951)
On Stochastic Approximation, A. Dvoretzky, (1956)
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, T. Jaakkola, M. I. Jordan, and S. P. Singh, NeurIPS, (1993)
Convergence Results for Single-Step On Policy Reinforcement Learning Algorithms, S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, Machine Learning, (2000)
On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning, C. Wang and K. Ross, (2020)
On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts, J. Liu, Automatica, (2021)