Learning Dynamics Models for Model Predictive Agents

Abstract:

Model-Based Reinforcement Learning involves learning a dynamics model from data, and then using this model to optimise behaviour, most often with an online planner. Much of the recent research along these lines presents a particular set of design choices, involving problem definition, model learning and planning. Given the multiple contributions, it is difficult to evaluate the effects of each. This paper sets out to disambiguate the role of different design choices for learning dynamics models, by comparing their performance to planning with a ground-truth model -- the simulator. First, we collect a rich data set from the training sequence of a model-free agent on 5 domains of the DeepMind Control Suite. Second, we train feed-forward dynamics models in a supervised fashion, and evaluate planner performance while varying and analysing different model design choices, including ensembling, stochasticity, multi-step training and time step size. Besides the quantitative analysis, we describe a set of qualitative findings, rules of thumb, and future research directions for planning with learned dynamics models.


Overview:

This site provides an overview of the qualitative behaviours of the learned models. Each sub-page visualizes a different aspect of the learned model. We provide many different videos to highlight the variety in terms of starting state and obtained reward.

Videos of Model Predictive Control : Shows the control performance of the learned models

Videos of the Open-Loop Prediction: Compares the open-loop trajectories of the learned ensembles with the true model

Videos of the Generated Plans: Visualizes the generated plans using the approximate model from one starting state

Videos of the Model Entropy: Visualizes the model entropy for each time step within a trajectory