We select arbitrary skills from skill space and fix them throughout 100 environment steps. The videos below show that the policy learns meaningful and distinct behaviours conditioned on the skill. Note that the agents are fully deterministic and variations are only due to the skill conditioning.
We transfer the learned low level skills to a target reaching task. The red spot represents the target and is randomly sampled in the arena.