Compositional Transfer in

Hierarchical Reinforcement Learning

Markus Wulfmeier*, Abbas Abdolmaleki*, Roland Hafner, Jost Tobias Springenberg, Michael Neunert,

Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

DeepMind, London

5948603 - 6 - 19500.mp4

Physical Robot Final Stacking

Including human disturbance for randomizing the distribution over initial states.

Single task: Visualization for Active Sub-Policy Components

The graph on the top right displays the number of the currently active sub-policy. On the bottom left, we display various parameters describing the system, including the currently active task (with a randomly chosen sequence displayed in each video). The videos elaborate on task decomposition into individual components as well as reuse of components across tasks.

stand_up_mog.mp4

Humanoid Standup (single task)

mujoco_17-28-50.mp4

Humanoid Run (single task)

Multitask: Visualization for Active Sub-Policy Components

mujoco_10-56-46.mp4

Pile1

mujoco_15-05-56.mp4

Pile2

trimmed2.mp4

Cleanup2

Multitask: Execution of All Physical Robot Tasks in the Pile1 Domain

Stack and Leave

Stack

Place Narrow

Place Wide

Lift

Grasp

Reach

All visualizations purely focus on the performance of the hierarchical models presented in the corresponding submission.

The complete paper including appendix (missing from the RSS 2020 proceedings) can be found under https://arxiv.org/abs/1906.11228