Our approach is able to efficiently learn to solve all four subtasks by combining task-agnostic experience data and task-specific demonstrations.
SPiRL can leverage task-agnostic prior experience to solve some of the subtasks, but the mismatch between the task-agnostic skill prior and target task can lead to the agent attempting to solve the wrong subtask (like turning on the stove) and failing at the precise execution of other subtasks (like opening the hinge cabinet).
The failure cases we can observe occur due to imprecise control, particularly when attempting to flip the light switch, which requires precise manipulation. Since our approach trains closed-loop policies the agent is trying to recover autonomously from some of the failures via retrying.
Our approach successfully leverages the learned skills to follow the provided demonstrations and solve the task. Since there is no penalty for slow task execution some rollouts feature periods of slow movement.
SPiRL struggles to learn the task since the skill prior encourages the policy to explore many task-irrelevant skills. Sometimes the agent even solves the incorrect subtasks like attempting to pick up a wrong object or picking up an object before opening the drawer it is supposed to be stored in.
One main challenge of the office environment is the manipulation of free-floating objects, which requires precise control of the end-effector. Most failure cases are the result of failed grasp attempts due to imprecise control. The agent also occasionally reaches the boundaries of its 5DOF joint limits.
The rolled-out skills are executed inaccurately by the open-loop skill decoder, leading to only occasional solving of subtasks.
Due to the more powerful skill representation via a closed-loop skill policy, the sampled skills succeed frequently at solving subtasks, enabling more powerful transfer of learned skills to unseen downstream tasks. Note that the order of solved subtasks varies between episodes since the skill prior is not trained to solve any particular task.