"Reinforcement Learning for Processing Networks" seminar
We meet in 261 Rhodes at 4:45pm-6:45pm on Wednesdays.
This week:
Schedule:
Previous presentations:
Title: Information relaxation methods for MDP: Theory and application
Discussion leader: Yilun Chen
Date: 04/11/2018
References:
- David Brown and Martin Haugh, (2017) Information relaxation bounds for infinite horizon markov decision processes. Operations Research, 65(5) 1355-1379. Link
- Daniel Adelman and Adam Mersereau, (2008) Relaxations of Weakly Coupled Stochastic Dynamic Programs. Operations Research, 56(3):712-727. Link
- Denis Belomestny, (2013) Solving optimal stopping problems via empirical dual optimization The Annals of Applied Probability, 2013, Vol.23, No.5, 1988-2019. Link
- Vijay V. Desai, Vivek F. Farias and Ciamac C. Moallemi (2012), Pathwise Optimization for Optimal Stopping Problems, Management Science 58(12):2292-2308. Link
- P. Ansell, K.Glazebrooky, J. Nino-Mora and M.O'Keffe. (2003) Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57(1) pp 21-39. Link
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Title: Stochastic Approximation Techniques in Reinforcement Learning
Discussion leader: Mark Gluzman
Date: 03/28/2018
References:
- J. Abounadi, D. P. Bertsekas, and V. Borkar (2002) Stochastic Approximation for Nonexpansive Maps: Application to Q -Learning Algorithms. SIAM Journal on Control and Optimization, 41(1):1–22. Link
- Even-Dar, E., & Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, 5, 1–25. Link
- Second chapter of V. S. Borkar (2008) Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press. Link
![](https://www.google.com/images/icons/product/drive-32.png)
Title: Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Discussion leader: Jefferson Huang
Date: 03/21/2018
References:
- Bhulai, S. (2017). Value Function Approximation in Complex Queueing Systems. In Markov Decision Processes in Practice (pp. 33-62). Springer. Link
- James, T., Glazebrook, K., & Lin, K. (2016). Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28(2), 251-264. Link
- Brown, D. B., & Haugh, M. B. (2017). Information relaxation bounds for infinite horizon Markov decision processes. Operations Research, 65(5), 1355-1379. Link
![](https://www.google.com/images/icons/product/drive-32.png)
Title: Diffusion approximations for performance analysis and optimal control: the Stein method/generator expansion framework
Discussion leader: Anton Braverman
Date: 03/14/2018
References:
- A. Braverman (2017). Stein's method for steady-state diffusion approximations (PhD thesis, Cornell University), https://arxiv.org/abs/1704.08398. Link
![](https://www.google.com/images/icons/product/drive-32.png)
Topic: Finite-Sample Analyses for Temporal Difference Learning, and Recent Developments in the Theory of Reinforcement Learning
Discussion leader: Massey Cashore
Date: 03/07/2018
References:
- G.Dalal et al. (2017) Finite Sample Analyses for TD(0) with Function Approximation https://arxiv.org/abs/1704.01161v4
- S. Tu and B. Recht (2017) Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator https://arxiv.org/abs/1712.08642
- N. Jiang et al. (2016) Contextual Decision Processes with Low Bellman Rank are PAC-Learnable https://arxiv.org/abs/1610.09512
![](https://www.google.com/images/icons/product/drive-32.png)
Topic: Mastering the game of Go without human knowledge.
Discussion leader: Aurora Feng
Date: 02/28/2018
References:
- D.Silver et al (2017). Mastering the game of Go without human knowledge, Nature, 550, p 354–359 Link; Unformatted (full) version from Deepmind.com: Link
- D.Silver et al (2016). Mastering the game of Go with deep neural networks and tree search, Nature, 529, p. 484–489 Link
- Introduction into Monte Carlo tree search: Link
![](https://www.google.com/images/icons/product/drive-32.png)
Topic: Positive Harris Recurrence of Semimartingale Reflecting Brownian Motion.
Discussion leader: Chang Cao
Date: 02/21/2018
References:
Topic: Diffusion Approximation For Queue Length
Discussion leader: Xiangyu Zhang
Date: 02/07/2018
References:
![](https://www.google.com/images/icons/product/drive-32.png)
Topic: “Exact” LSTD learning for queueing systems
Discussion leader: Mark Gluzman
Date: 01/31/2018
References:
Main:
- Section 11.5 of Meyn, S. P. (2008). Control techniques for complex networks. Cambridge University Press. Link
Additional:
- Bertsekas, D. P. (2012). Dynamic programming and optimal control V.II. Athena Scientific, Belmont, 4th edition.
- Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press, 2nd (in progress) edition. Link
![](https://www.google.com/images/icons/product/drive-32.png)