Cornell University

"Reinforcement Learning for Processing Networks" seminar

We meet in 261 Rhodes at 4:45pm-6:45pm on Wednesdays.

This week:

Schedule:

Previous presentations:

Title: Information relaxation methods for MDP: Theory and application

Discussion leader: Yilun Chen

Date: 04/11/2018

References:

David Brown and Martin Haugh, (2017) Information relaxation bounds for infinite horizon markov decision processes. Operations Research, 65(5) 1355-1379. Link
Daniel Adelman and Adam Mersereau, (2008) Relaxations of Weakly Coupled Stochastic Dynamic Programs. Operations Research, 56(3):712-727. Link
Denis Belomestny, (2013) Solving optimal stopping problems via empirical dual optimization The Annals of Applied Probability, 2013, Vol.23, No.5, 1988-2019. Link
Vijay V. Desai, Vivek F. Farias and Ciamac C. Moallemi (2012), Pathwise Optimization for Optimal Stopping Problems, Management Science 58(12):2292-2308. Link
P. Ansell, K.Glazebrooky, J. Nino-Mora and M.O'Keffe. (2003) Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57(1) pp 21-39. Link

RL_seminar_presentation_v3_0.pdf

RL_seminar_presentation_v5_0.pdf

Title: Stochastic Approximation Techniques in Reinforcement Learning

Discussion leader: Mark Gluzman

Date: 03/28/2018

References:

J. Abounadi, D. P. Bertsekas, and V. Borkar (2002) Stochastic Approximation for Nonexpansive Maps: Application to Q -Learning Algorithms. SIAM Journal on Control and Optimization, 41(1):1–22. Link
Even-Dar, E., & Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, 5, 1–25. Link
Second chapter of V. S. Borkar (2008) Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press. Link

RLPN presentation2.pdf

Title: Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Discussion leader: Jefferson Huang

Date: 03/21/2018

References:

Bhulai, S. (2017). Value Function Approximation in Complex Queueing Systems. In Markov Decision Processes in Practice (pp. 33-62). Springer. Link
James, T., Glazebrook, K., & Lin, K. (2016). Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28(2), 251-264. Link
Brown, D. B., & Haugh, M. B. (2017). Information relaxation bounds for infinite horizon Markov decision processes. Operations Research, 65(5), 1355-1379. Link

2018-rlpn.pdf

Title: Diffusion approximations for performance analysis and optimal control: the Stein method/generator expansion framework

Discussion leader: Anton Braverman

Date: 03/14/2018

References:

A. Braverman (2017). Stein's method for steady-state diffusion approximations (PhD thesis, Cornell University), https://arxiv.org/abs/1704.08398. Link

Kellogg+IEMS.pdf

Topic: Finite-Sample Analyses for Temporal Difference Learning, and Recent Developments in the Theory of Reinforcement Learning

Discussion leader: Massey Cashore

Date: 03/07/2018

References:

G.Dalal et al. (2017) Finite Sample Analyses for TD(0) with Function Approximation https://arxiv.org/abs/1704.01161v4
S. Tu and B. Recht (2017) Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator https://arxiv.org/abs/1712.08642
N. Jiang et al. (2016) Contextual Decision Processes with Low Bellman Rank are PAC-Learnable https://arxiv.org/abs/1610.09512

RLPN_Reading_Group__2018.pdf

Topic: Mastering the game of Go without human knowledge.

Discussion leader: Aurora Feng

Date: 02/28/2018

References:

D.Silver et al (2017). Mastering the game of Go without human knowledge, Nature, 550, p 354–359 Link; Unformatted (full) version from Deepmind.com: Link
D.Silver et al (2016). Mastering the game of Go with deep neural networks and tree search, Nature, 529, p. 484–489 Link
Introduction into Monte Carlo tree search: Link

Feng-02282018-AlphaGo.pptx

Topic: Positive Harris Recurrence of Semimartingale Reflecting Brownian Motion.

Discussion leader: Chang Cao

Date: 02/21/2018

References:

Dai, J. (1995). On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models. Ann. Appl. Probab. 5, 1, 49–77. Link
P. Dupuis and R. J. Williams (1994) Lyapunov functions for semimartingale reflecting Brownian motions. Ann. Probab. 22, 2, 680-702. Link

Topic: Diffusion Approximation For Queue Length

Discussion leader: Xiangyu Zhang

Date: 02/07/2018

References:

J. Michael Harrison, Martin I. Reiman (1981). Reflected Brownian Motion on an Orthant. Ann. Probab. 9 (2), 302-308 Link
J.M. Harrison and R.J. Williams (1987), Brownian models of open queueing networks with homogeneous customer populations, Stochastics 22, 77–115. Link

Diffusion Approximation (1).pptx

Topic: “Exact” LSTD learning for queueing systems

Discussion leader: Mark Gluzman

Date: 01/31/2018

References:

Main:

Section 11.5 of Meyn, S. P. (2008). Control techniques for complex networks. Cambridge University Press. Link

Additional:

Bertsekas, D. P. (2012). Dynamic programming and optimal control V.II. Athena Scientific, Belmont, 4th edition.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press, 2nd (in progress) edition. Link

RLPN presentation1.pdf