Partially Amortized Planning with Hierarchical Latent Plans in Model-based Reinforcement Learning
Anonymous ICLR 2021 submission
Anonymous ICLR 2021 submission
Video Skill Visualizations
Video Skill Visualizations
We select arbitrary skills from skill space and fix them throughout 100 environment steps. The videos below show that the policy learns meaningful and distinct behaviours conditioned on the skill. Note that the agents are fully deterministic and variations are only due to the skill conditioning.
Quadruped Walk: Skills Within the Unit Ball
Quadruped Walk: Skills Within the Unit Ball
Quadruped Walk: Skills Outside the Unit Ball
Quadruped Walk: Skills Outside the Unit Ball
Transfer to Quadruped Reach
Transfer to Quadruped Reach
We transfer the learned low level skills to a target reaching task. The red spot represents the target and is randomly sampled in the arena.