MDPs and RL

ECE586RS: MDPs and Reinforcement Learning

Term: Fall 2017

Prerequisites: ECE 534 (Random Processes)

Instructor: Prof. R. Srikant, rsrikant@illinois.edu

Lecturer: Dr. Dimitris Katselis, katselis@illinois.edu

Prof. Srikant’s Office Hours: 2:15-3:30 pm Tue, 107 CSL.

Dr. Katselis' Office Hours: 1:00-2:00 pm Mon, Room 3034 ECEB

Lectures: 11-12:20 MW in Room 3081 ECEB

Fall Break: Nov. 18-Nov. 26

Last Day of instruction: Dec. 13

Outline (Time Permitting): Markov Chains, MDPs, Discounted Cost Problems; Value Iteration, Policy Iteration and LP Formulation; Q-Learning and Stochastic Approximation; Neural Networks, Backpropagation and Applications to RL; Linear Function Approximations, Weighted Norm Contractions and Stochastic Approximation; Reinforcement Learning for Stochastic Shortest Path and Average Cost Problems; SARSA; TD(lambda); Reinforcement Learning for Dynamic Games; POMDPs

Grading (Link to compass):

Homework (80%): Collaboration is allowed, but solutions have to be written individually. Directly copying solutions will be viewed very unfavorably. Homework is due in class on the dates mentioned in the problem sets. Late homework will not be accepted.
Project (10%): Will have to be a software-oriented Reinforcement Learning project. Examples: Computer game-playing programs, scheduling algorithm for data centers, financial engineering applications, robotic navigation and control, etc. We will not help with the selection of the topic nor the details of the project. The project can be on a topic related to your research, but it cannot directly be a part of your thesis work or other research that is being conducted in collaboration with your advisor. Projects have to be individually done, so no two projects can be identical. However, you are welcome to discuss ideas, code, etc. with others. You can also use the Internet to search for ideas for your project. Needless to say, you cannot submit someone else's code as your project. This would be viewed as a serious case of plagiarism. You can use standard libraries in TensorFlow, PyTorch, etc., but your project has to explore something well beyond standard tutorial material for these libraries. By Nov. 1, you have to email a 1-2 page outline of what you plan to do in the project: please email your outline to both of us. You will get zero points for the project if the outline is not submitted by Nov. 1. We will provide feedback on the outline if we feel that your proposed project is not adequate. You are welcome to come one of our office hours ahead of this deadline to discuss your ideas for the project, and check if they are substantial enough for a project for this course. The final submission for your project should consist of (i) a report of length less than or equal to 10 pages, describing your project and the experimental results you obtained from your code and (ii) the code you developed as part of your project. These should be emailed to both of us by Dec. 13.
Final exam (10%): 1:30-4:30 pm Dec. 19, Rooms: 2013 and 3081 ECEB, http://registrar.illinois.edu/spring2017schedulingguidelinespublic. Three 8.5"x11" sheets (i.e., six pages) of handwritten notes will be allowed.

References:

D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming, 1996.
D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. I, 3rd Edition, 2005.
D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming, 4th Edition, 2012.
S. M. Ross. Applied Probability Models with Optimization Applications, Dover, 1992.
R. Weber. Optimization and Control, 2016. Course notes available on Prof. Weber's website at Cambridge University.

Report abuse