Partially Amortized Planning with Hierarchical Latent Plans in Model-based Reinforcement Learning

Anonymous ICLR 2021 submission

Video Skill Visualizations

We select arbitrary skills from skill space and fix them throughout 100 environment steps. The videos below show that the policy learns meaningful and distinct behaviours conditioned on the skill. Note that the agents are fully deterministic and variations are only due to the skill conditioning.

Quadruped Walk: Skills Within the Unit Ball

Quadruped Walk: Skills Outside the Unit Ball

Transfer to Quadruped Reach

We transfer the learned low level skills to a target reaching task. The red spot represents the target and is randomly sampled in the arena.