Planning from Pixels using Inverse Dynamics Models
Anonymous Authors
Abstract:
Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches.
Goal Achievement Visualizations
The following animations show a GLAMOR agent achieving goals in all tested environments. Goals are blended with the animation of the agent attempting to reach them.
See https://sites.google.com/view/discern-paper for visualizations of DISCERN, which was used as a baseline.
Atari
DeepMind Control Suite
Termination Strategy Visualizations
By explicitely planning to reach the goal at the end of the episode, GLAMOR can perform well on cartpole even without the ability evaluate goal-achievement early.
Early Termination
Naive Termination (end of episode)
Planned Termination (end of episode)