Tutorial on Model-Based Methods in Reinforcement Learning
By Igor Mordatch and Jessica Hamrick
Presented at International Conference on Machine Learning (ICML) 2020
Abstract
This tutorial presents a broad overview of the field of model-based reinforcement learning (MBRL), with a particular emphasis on deep methods. MBRL methods utilize a model of the environment to make decisions—as opposed to treating the environment as a black box—and present unique opportunities and challenges beyond model-free RL. We discuss methods for learning transition and reward models, ways in which those models can effectively be used to make better decisions, and the relationship between planning and learning. We also highlight ways that models of the world can be leveraged beyond the typical RL setting, and what insights might be drawn from human cognition when designing future MBRL systems.
Goals
The field of reinforcement learning has produced significantly impressive results in recent years, but has largely focused on model-free methods. However, the community recognizes limitations of purely model-free methods, from high sample complexity, need of sampling unsafe outcomes, to stability and reproducibility issues. By contrast, model-based methods have been under-explored (but growing fast) in the machine learning community despite being very influential in robotics, engineering, and cognitive and neural sciences. They provide a distinct set of advantages and challenges as well as complementary mathematical tools. The aim of this tutorial is to make model-based methods more recognized and accessible to the machine learning community. Given recent successful applications of model-based planning, such as AlphaGo, we believe there is timely demand for a comprehensive understanding of this topic. By the end of the tutorial, the audience should gain:
Mathematical background to read and follow up with the literature on the topic.
An intuitive understanding of the algorithms involved (and have access to lightweight example code they can use and experiment with).
Awareness of the tradeoffs and challenges involved in applying model-based methods.
Appreciation for diversity of problems in which model-based reasoning can be applied.
Understanding of how these methods fit in the broader context of reinforcement learning and theories of decision-making as well as relationship to model-free methods.
Target audience and required background
This tutorial will be accessible to the general machine learning audience, but specifically targeted at the following groups with their own specific learning goals:
Reinforcement learning researchers and practitioners who primarily worked with model-free methods and are looking to acquire a new set of techniques and background to complement or address challenges they are currently facing.
Supervised or unsupervised learning researchers who are looking to learn how their work can be applicable in reinforcement learning setting.
Cognitive science researchers who may be familiar with the core ideas of the topic, but are looking to learn about algorithms and implementation guidelines that are practical in complex high-dimensional settings.
Robotics researchers and practitioners that are familiar with model-based control, but are looking for background and advice on how to unite them with learning methods.
Familiarity with basic supervised learning methods will be expected, and some familiarity with reinforcement learning formulation and model-free methods will be beneficial but not required.
Bibliography
Amos et al (2018). Differentiable MPC for End-to-end Planning and Control. NeurIPS 2018.
Amos et al (2019). The Differentiable Cross-Entropy Method. arXiv.
Anthony et al (2017). Thinking Fast and Slow with Deep Learning and Tree Search. NeurIPS.
Bellemare et al (2016). Unifying count-based exploration and intrinsic motivation. NeurIPS.
Blundell et al (2015). Weight Uncertainty in Neural Networks. ICML 2015.
Burgess et al. (2019). MONet: Unsupervised Scene Decomposition and Representation. arXiv.
Chiappa, Racaniere, Wierstra, Mohamed (2017). Recurrent environment simulators. ICLR 2017.
Craik (1943). The Nature of Explanation. Cambridge University Press.
Du et al (2019). Model-Based Planning with Energy Based Models. CoRL 2019
Ecoffet et al. (2019). Go-explore: a new approach for hard-exploration problems. arXiv.
Edwards, Downs, & Davidson (2018). Forward-Backward Reinforcement Learning. arXiv.
Ellis et al. (2019). Write, Execute, Assess: Program Synthesis with a REPL. NeurIPS.
Finn & Levine (2017). Deep visual foresight for planning robot motion. ICRA.
Griffiths & Tenenbaum (2009). Theory-based causal induction. Psychological Review, 116(4).
Guez et al (2019). An Investigation of Model-Free Planning. ICML 2019.
Hafner et al (2020). Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020.
Hamrick et al (2020). Combining Q-Learning and Search with Amortized Value Estimates. ICLR 2020.
Hamrick et al. (2017). Metacontrol for adaptive imagination-based optimization. ICLR 2017.
Houthooft et al (2016). VIME: Variational Information Maximizing Exploration. NeurIPS 2016.
Jacobson and Mayne (1970). Differential Dynamic Programming.
Jaderberg et al. (2017). Reinforcement learning with unsupervised auxiliary tasks. ICLR 2017.
Jang, Gu, & Poole (2017). Categorical Reparameterization with Gumbel-Softmax. ICLR 2017.
Janner et al (2019). When to Trust Your Model: Model-Based Policy Optimization. NeurIPS 2019.
Kaelbling & Lozano-Pérez (2011). Hierarchical Task and Motion Planning in the Now. ICRA.
Kalakrishnan et al (2011). STOMP: Stochastic trajectory optimization for motion planning. ICRA 2011.
Kidambi et al (2020). MOReL: Model-Based Offline Reinforcement Learning. arXiv.
Kovar, Gleicher, & Pighin (2002). Motion graphs. ACM Transactions on Graphics, 21(3).
Kurutach et al. (2018). Learning Plannable Representations with Causal InfoGAN. NeurIPS.
Lin et al (2020). Model-based Adversarial Meta-Reinforcement Learning. arXiv.
Maddison, Mnih, & Teh (2017). The Concrete Distribution. ICLR 2017.
Mannor et al. (2003). The Cross-Entropy Method for fast policy search. ICML 2003.
Markman, Klein, & Suhr (2008). Handbook of Imagination and Mental Simulation.
Nagabandi et al (2019). Deep Dynamics Models for Learning Dexterous Manipulation. CoRL 2019.
Nair, Babaeizadeh, Finn, Levine & Kumar (2020). Time Reversal as Self-Supervision. ICRA 2020.
Nair, Pong, et al. (2018). Visual Reinforcement Learning with Imagined Goals. NeurIPS.
Nasiriany et al. (2019). Planning with Goal-Conditioned Policies. NeurIPS.
OpenAI et al. (2019). Solving Rubik's Cube with a Robot Hand. arXiv.
Osband et al (2018). Randomized Prior Functions for Deep Reinforcement Learning. NeurIPS 2018.
Pascanu, Li, et al. (2017). Learning model-based planning from scratch. arXiv.
Pathak et al. (2017). Curiosity-driven exploration by self-supervised prediction. ICML.
Peters, Mulling, & Altun (2010). Relative Entropy Policy Search. AAAI 2010.
Rajeswaran et al. (2020). A Game Theoretic Framework for Model Based Reinforcement Learning. arXiv.
Sadigh et al. (2016). Planning for autonomous cars that leverage effects on human actions. RSS 2016.
Savinov, Dosovitskiy, & Koltun (2018). Semi-parametric topological memory for navigation. ICLR 2018.
Schmidhuber (1991). Curious model-building control systems. IJCNN.
Sharma et al. (2020). Dynamics-Aware Unsupervised Discovery of Skills. ICLR.
Shen et al. (2019). M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search. NeurIPS.
Silver et al (2017). Mastering the game of Go without human knowledge. Nature, 550, 354-359.
Spelke & Kinzler (2007). Core knowledge. Developmental science, 10(1).
Sutton and Barto (2018). Reinforcement Learning: An Introduction.
Tamar et al. (2016). Value iteration networks. NeurIPS 2016.
Tamar et al. (2017). Learning from the Hindsight Plan – Episodic MPC Improvement. ICRA 2017.
Theodorou et al. (2010). Learning policy Improvements with path integrals. AISTATS 2010.
Todorov, Erez, & Tassa (2012). MuJoCo: A physics engine for model-based control. IROS.
Venkatraman et al. (2014). Data as demonstrator with applications to system identification.
Williams et al. (2017). Information Theoretic MPC for Model-Based Reinforcement Learning. ICRA 2017.
Yu et al (2020). MOPO: Model-based Offline Policy Optimization. arXiv.
Zhang, Lerer, et al. (2018). Composable Planning with Attributes. ICML 2018.
Questions or Feedback?
Contact us at jhamrick@ or imordatch@ (followed by google.com)