"Reinforcement Learning for Processing Networks" seminar

We meet in 261 Rhodes at 4:45pm-6:45pm on Wednesdays.

This week:

Schedule:


Previous presentations:

Title: Information relaxation methods for MDP: Theory and application

Discussion leader: Yilun Chen

Date: 04/11/2018

References:

  • David Brown and Martin Haugh, (2017) Information relaxation bounds for infinite horizon markov decision processes. Operations Research, 65(5) 1355-1379. Link
  • Daniel Adelman and Adam Mersereau, (2008) Relaxations of Weakly Coupled Stochastic Dynamic Programs. Operations Research, 56(3):712-727. Link
  • Denis Belomestny, (2013) Solving optimal stopping problems via empirical dual optimization The Annals of Applied Probability, 2013, Vol.23, No.5, 1988-2019. Link
  • Vijay V. Desai, Vivek F. Farias and Ciamac C. Moallemi (2012), Pathwise Optimization for Optimal Stopping Problems, Management Science 58(12):2292-2308. Link
  • P. Ansell, K.Glazebrooky, J. Nino-Mora and M.O'Keffe. (2003) Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57(1) pp 21-39. Link
RL_seminar_presentation_v3_0.pdf
RL_seminar_presentation_v5_0.pdf

Title: Stochastic Approximation Techniques in Reinforcement Learning

Discussion leader: Mark Gluzman

Date: 03/28/2018

References:

  • J. Abounadi, D. P. Bertsekas, and V. Borkar (2002) Stochastic Approximation for Nonexpansive Maps: Application to Q -Learning Algorithms. SIAM Journal on Control and Optimization, 41(1):1–22. Link
  • Even-Dar, E., & Mansour, Y. (2003). Learning rates for Q-learning. Journal of Machine Learning Research, 5, 1–25. Link
  • Second chapter of V. S. Borkar (2008) Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press. Link
RLPN presentation2.pdf

Title: Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Discussion leader: Jefferson Huang

Date: 03/21/2018

References:

  • Bhulai, S. (2017). Value Function Approximation in Complex Queueing Systems. In Markov Decision Processes in Practice (pp. 33-62). Springer. Link
  • James, T., Glazebrook, K., & Lin, K. (2016). Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28(2), 251-264. Link
  • Brown, D. B., & Haugh, M. B. (2017). Information relaxation bounds for infinite horizon Markov decision processes. Operations Research, 65(5), 1355-1379. Link
2018-rlpn.pdf

Title: Diffusion approximations for performance analysis and optimal control: the Stein method/generator expansion framework

Discussion leader: Anton Braverman

Date: 03/14/2018

References:

  • A. Braverman (2017). Stein's method for steady-state diffusion approximations (PhD thesis, Cornell University), https://arxiv.org/abs/1704.08398. Link
Kellogg+IEMS.pdf

Topic: Finite-Sample Analyses for Temporal Difference Learning, and Recent Developments in the Theory of Reinforcement Learning

Discussion leader: Massey Cashore

Date: 03/07/2018

References:

RLPN_Reading_Group__2018.pdf

Topic: Mastering the game of Go without human knowledge.

Discussion leader: Aurora Feng

Date: 02/28/2018

References:

  • D.Silver et al (2017). Mastering the game of Go without human knowledge, Nature, 550, p 354–359 Link; Unformatted (full) version from Deepmind.com: Link
  • D.Silver et al (2016). Mastering the game of Go with deep neural networks and tree search, Nature, 529, p. 484–489 Link
  • Introduction into Monte Carlo tree search: Link
Feng-02282018-AlphaGo.pptx

Topic: Positive Harris Recurrence of Semimartingale Reflecting Brownian Motion.

Discussion leader: Chang Cao

Date: 02/21/2018

References:

  • Dai, J. (1995). On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models. Ann. Appl. Probab. 5, 1, 49–77. Link
  • P. Dupuis and R. J. Williams (1994) Lyapunov functions for semimartingale reflecting Brownian motions. Ann. Probab. 22, 2, 680-702. Link

Topic: Diffusion Approximation For Queue Length

Discussion leader: Xiangyu Zhang

Date: 02/07/2018

References:

  • J. Michael Harrison, Martin I. Reiman (1981). Reflected Brownian Motion on an Orthant. Ann. Probab. 9 (2), 302-308 Link
  • J.M. Harrison and R.J. Williams (1987), Brownian models of open queueing networks with homogeneous customer populations, Stochastics 22, 77–115. Link
Diffusion Approximation (1).pptx

Topic: “Exact” LSTD learning for queueing systems

Discussion leader: Mark Gluzman

Date: 01/31/2018

References:

Main:

  • Section 11.5 of Meyn, S. P. (2008). Control techniques for complex networks. Cambridge University Press. Link

Additional:

  • Bertsekas, D. P. (2012). Dynamic programming and optimal control V.II. Athena Scientific, Belmont, 4th edition.
  • Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press, 2nd (in progress) edition. Link
RLPN presentation1.pdf