References and materials
[1] Huyên Pham. Continuous-time Stochastic Control and Optimization with Financial Applications.
[2] Goran Peskir, Albert Shiryaev. Optimal Stopping and Free-Boundary Problems.
[3] Jiongmin Yong and Xun Yu Zhou. Stochastic Controls: Hamiltonian Systems and HJB Equations
[4] Lecture notes by Ramon van Handel: https://web.math.princeton.edu/~rvan/acm217/ACM217.pdf
[5] Philip Protter, Stochastic Integration and Differential Equations
[6] Nizar Touzi. Optimal Stochastic Control, Stochastic Target Problems and Backward SDE
[7] Dellacherie, C., and Meyer, P.A. (1978). Probabilities and Potential (Vol. 29). In North-Holland Mathematics Studies. Hermann, Paris.
[8] Dellacherie, C., and Meyer, P.A. (1982). Probabilities and Potential B: Theory of Martingales (Vol. 72). In North-Holland Mathematics Studies. Hermann, Paris.
[9] Durret, R. (2019). Probability: Theory and Examples (5th ed.). Cambridge: Cambridge University Press.
[10] Lecture notes on viscosity solutions by Jeff Calder: https://www-users.cse.umn.edu/~jwcalder/viscosity_solutions.pdf
[11] Wang, Haoran, Thaleia Zariphopoulou, and Xun Yu Zhou. "Reinforcement learning in continuous time and space: A stochastic control approach." Journal of Machine Learning Research 21, no. 198 (2020): 1-34.
[12] Jia, Yanwei, and Xun Yu Zhou. "Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach." Journal of Machine Learning Research 23, no. 154 (2022): 1-55.
[13] Jia, Yanwei, and Xun Yu Zhou. "Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms." Journal of Machine Learning Research 23, no. 275 (2022): 1-50.
[14] Giegrich, Michael, Christoph Reisinger, and Yufei Zhang. "Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems." SIAM Journal on Control and Optimization 62, no. 2 (2024): 1060-1092.
[15] Jia, Yanwei, and Xun Yu Zhou. "q-Learning in continuous time." Journal of Machine Learning Research 24, no. 161 (2023): 1-61.
[16] Tang, Wenpin, and Xun Yu Zhou. "Regret of exploratory policy improvement and $ q $-learning." arXiv preprint arXiv:2411.01302 (2024).
[17] Divol, V., Niles-Weed, J., & Pooladian, A. A. (2025). Tight stability bounds for entropic Brenier maps. International Mathematics Research Notices, 2025(7), rnaf078.