Stuff to read:
- Optimal control track: 
Vamvoudakis, K. G., & Lewis, F. L. (2010). Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5), 878-888.
Lewis, F. L., Vrabie, D., & Syrmos, V. L. (2012). Optimal control. John Wiley & Sons.
- Folks who like MDP: 
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press.