Demonstration-Guided Reinforcement Learning with Learned Skills

Supplementary Material

Kitchen Environment Results

Kitchen Environment - Ours

Our approach is able to efficiently learn to solve all four subtasks by combining task-agnostic experience data and task-specific demonstrations.

Kitchen Environment - SPiRL

SPiRL can leverage task-agnostic prior experience to solve some of the subtasks, but the mismatch between the task-agnostic skill prior and target task can lead to the agent attempting to solve the wrong subtask (like turning on the stove) and failing at the precise execution of other subtasks (like opening the hinge cabinet).

Kitchen Environment - Ours (Failure Cases)

The failure cases we can observe occur due to imprecise control, particularly when attempting to flip the light switch, which requires precise manipulation. Since our approach trains closed-loop policies the agent is trying to recover autonomously from some of the failures via retrying.

Office Environment Results

Office Environment - Ours

Our approach successfully leverages the learned skills to follow the provided demonstrations and solve the task. Since there is no penalty for slow task execution some rollouts feature periods of slow movement.

Office Environment - SPiRL

SPiRL struggles to learn the task since the skill prior encourages the policy to explore many task-irrelevant skills. Sometimes the agent even solves the incorrect subtasks like attempting to pick up a wrong object or picking up an object before opening the drawer it is supposed to be stored in.

Office Environment - Ours (Failure Cases)

One main challenge of the office environment is the manipulation of free-floating objects, which requires precise control of the end-effector. Most failure cases are the result of failed grasp attempts due to imprecise control. The agent also occasionally reaches the boundaries of its 5DOF joint limits.

Skill Prior Rollouts

Open-Loop Skill Decoder Representation (Pertsch et al., 2020)

The rolled-out skills are executed inaccurately by the open-loop skill decoder, leading to only occasional solving of subtasks.

Closed-Loop Skill Policy Representation (Ours)

Due to the more powerful skill representation via a closed-loop skill policy, the sampled skills succeed frequently at solving subtasks, enabling more powerful transfer of learned skills to unseen downstream tasks. Note that the order of solved subtasks varies between episodes since the skill prior is not trained to solve any particular task.