Compositional Transfer in
Hierarchical Reinforcement Learning
Markus Wulfmeier*, Abbas Abdolmaleki*, Roland Hafner, Jost Tobias Springenberg, Michael Neunert,
Markus Wulfmeier*, Abbas Abdolmaleki*, Roland Hafner, Jost Tobias Springenberg, Michael Neunert,
Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller
Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller
DeepMind, London
DeepMind, London
5948603 - 6 - 19500.mp4
Physical Robot Final Stacking
Physical Robot Final Stacking
Including human disturbance for randomizing the distribution over initial states.
Single task: Visualization for Active Sub-Policy Components
Single task: Visualization for Active Sub-Policy Components
The graph on the top right displays the number of the currently active sub-policy. On the bottom left, we display various parameters describing the system, including the currently active task (with a randomly chosen sequence displayed in each video). The videos elaborate on task decomposition into individual components as well as reuse of components across tasks.
stand_up_mog.mp4
Humanoid Standup (single task)
Humanoid Standup (single task)
mujoco_17-28-50.mp4
Humanoid Run (single task)
Humanoid Run (single task)
Multitask: Visualization for Active Sub-Policy Components
Multitask: Visualization for Active Sub-Policy Components
mujoco_10-56-46.mp4
Pile1
Pile1
mujoco_15-05-56.mp4
Pile2
Pile2
trimmed2.mp4
Cleanup2
Cleanup2
Multitask: Execution of All Physical Robot Tasks in the Pile1 Domain
Multitask: Execution of All Physical Robot Tasks in the Pile1 Domain
Stack and Leave
Stack and Leave
Stack
Stack
Place Narrow
Place Narrow
Place Wide
Place Wide
Lift
Lift
Grasp
Grasp
Reach
Reach
All visualizations purely focus on the performance of the hierarchical models presented in the corresponding submission.
The complete paper including appendix (missing from the RSS 2020 proceedings) can be found under https://arxiv.org/abs/1906.11228