NIPS 2014 Workshop on Large-scale reinforcement learning and Markov decision problems Montreal, Quebec, Canada December 13, 2014 Abstract: Solving a large state MDP problem can be computationally intractable in the worst case [PT, CT]. Despite these negative results, several algorithms are shown to perform remarkably well in certain large state problems. Examples are UCT algorithm of Kocsis and Szepesvari [KS] applied in heuristic search and games, Rapidly exploring Random Trees (RRT) of LaValle and Kuffner [LK] in motion planning, policy gradient methods applied in robotics [KP, GE], approximate linear programming (ALP) applied in queuing networks [FV, DFM], and approximate dynamic programming applied in very large scale industrial applications [P]. These algorithms are developed mostly independently in different communities. Despite some similarities, the relation between them and what makes them effective is not very clear. The computational problem that we discussed above is the focus of optimization and ADP communities. The challenge is to find a computationally efficient way to achieve something that would be easy if we had infinite computational resources. In reinforcement learning, we encounter additional statistical challenges; even if we have infinite computational power, it is not clear how we should best make inferences from observations and select actions that balance between exploration and exploitation. This workshop will bring researchers from different communities together to discuss and exchange ideas about effective approaches and open problems in large scale MDP problems. References: [B] R. Bellman, “Dynamic Programming”, Princeton University Press, 1957. [H] R. A. Howard, “Dynamic Programming and Markov Processes”, MIT, 1960. [SB] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction . Bradford Book. MIT Press, 1998. [PT] C. H. Papadimitriou and J. N. Tsitsiklis. "The Complexity of Markov Decision Processes", Mathematics of Operations Research, Vol. 12, No. 3, 1987, pp. 441-450. [CT] C.-S. Chow and J. N. Tsitsiklis, "The Complexity of Dynamic Programming", Journal of Complexity, Vol. 5, No. 4, 1989, pp. 466-488. [GE] M. Ghavamzadeh and Y. Engel, "Bayesian Policy Gradient Algorithms". Neural Information Processing Systems (NIPS), 2006. [LK] S. M. LaValle, J. J. Kuffner, “Randomized kinodynamic planning”, The International Journal of Robotics Research 20 (5), 378-400, 2001 [KS] L. Kocsis and C. Szepesvari, "Bandit based monte-carlo planning", ECML, 2006. [KP] J. Kober and J. Peters, "Policy Search for Motor Primitives in Robotics", Machine Learning, 84, 1-2, pp.171-203, 2011. [FV] D. P. de Farias and B. Van Roy, "The linear programming approach to approximate dynamic programming", Operations Research, 51, 2003. [DFM] V. V. Desai, V. F. Farias, and C. C. Moallemi, "Approximate dynamic programming via a smoothed linear program", Operations Research , 60(3):655–674, 2012. [P] W. B. Powell, “Approximate Dynamic Programming: Solving the curses of dimensionality”, John Wiley & Sons, 2007. |