We present results related to our paper "Learning Robot Skills with Temporal Variational Inference" below. In particular, this site serves to visualize dynamic results, including GIFs and videos, that may not be effectively viewed in the paper. We recommend viewing this page in Mozilla Firefox - some images may not load in Google Chrome.
The first result we present is a dynamic visualization of the embedded latent space of our policies, as depicted in Figure 2, 3, and 4 of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent option choice, and is positioned at the corresponding embedded location of its latent variable in the embedded space.
MIME Dataset:
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt primitives in video. Scrolling across the webpage when zoomed in is also useful.
Roboturk Dataset:
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt primitives in video. Scrolling across the webpage when zoomed in is also useful.
Mocap Dataset:
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt primitives in video. Scrolling across the webpage when zoomed in is also useful.
We now present dynamic visualizations of the individual skills that our model learns. These skills are visualized by rolling out the policy for a particular latent z, and are displayed below as GIFs across all three datasets and morphologies of agent.
MIME Dataset:
Left Handed Reaching
Right Handed Reaching
Pushing Right Hand
Sliding Left Hand
Placing Right Hand
Returning Left Hand
Roboturk Dataset:
Close Gripper
Place to the Left
Push / Slide
Reaching from Right (with open gripper)
Place to the Right
Place to the Middle
Mocap Dataset:
Running and Walking
Running and Ducking
Dance Move
Front Flip
Receiving basketball and dribbling
Running and skateboard jump
We now showcase the ability of our model to reconstruct the demonstrated trajectories in terms of the learnt skills. The reconstructed trajectories (on the right) are obtained by passing the original demonstration (on the left) through the variational network q, and rolling out the low-level policy with the resultant latent z's as input.
MIME Dataset:
Trajectory 1: The original demonstration consists of a reaching motion with the right hand, followed by a reaching motion with the left hand, and a bimanual pushing motion. The reconstructed rollout is able to capture the overall structure of the demonstration well, and there is little to no jerkiness between execution of the various primitives.
Original Demonstration
Reconstructed Rollout
Trajectory 2: This trajectory consists of a left-handed reaching motion, a grasping motion, a placing motion, and finally a returning motion. The reconstructed rollout of the policy correspondingly executes left-handed reaching, placing and returning skills, although the fine gripper motion is not perfectly captured. The overall structure of the demonstration is again well understood.
Original Demonstration
Reconstructed Rollout
Trajectory 3: This trajectory consists of a right-handed reaching motion, a sliding motion, followed by a returning motion. The reconstructed rollout correspondingly also executes right-handed reaching, sliding, and returnining skills. As observed above, the overall structure of the demonstrations is well captured.
Original Demonstration
Reconstructed Rollout
Roboturk Dataset:
Trajectory 1: The original demonstration consists of a reaching motion towards the middle from the right of the workspace (from the camera perspective), followed by a grasping primitive, and a placing motion. The reconstructed rollout captures these well, and executes corresponding skills that first reach from the right of the workspace to the middle, close the grippers, and finally placing motion slightly to the right. The learnt policies capture the overall structure of the demonstrations, as well as the fine motions (closing the gripper) well.
Original Demonstration
Reconstructed Rollout
Trajectory 2: This trajectory consists of a reaching motion from the far right to the left of the workspace, followed by a grasping motion, and finally a placing motion back to the far right. The reconstructed rollout correspondingly executes skills where the robot first reaches from the far right, closes its grippers, and finally executes a placing motion to the far right.
Original Demonstration
Reconstructed Rollout
Trajectory 3: This trajectory consists of a reach downwards from the middle of the workspace, a grasp, and a placing motion to the right of the workspace. The learnt policy correspondingly executes skills that reach downwards, close grippers, and finally place to the right. As above, the learnt policy captures both the overall structure of demonstrations as well as fine motions such as grasping.
Original Demonstration
Reconstructed Rollout
Mocap Dataset:
Note that in the case of the Mocap Dataset, we imitate the local positions of the agent available in the dataset. This implies that our policies are trained to imitate relative positions of the various joints of the agent with respect to one another, but do not track global position of the character. Hence the original and reconstructed trajectories depicted below vary in global positions, but are very similar in relative joint positions.
Trajectory 1: The original trajectory consists of a series of 3 jumps, followed by a short stretch of walking. Our model learns to execute 3 jumping skills, and subsequently a walking motion to reconstruct the demonstration.
Original Demonstration
Reconstructed Rollout
Trajectory 2: This trajectory consists of two twirling motions followed by a bending down motion. The learnt policies learn to reconstruct this demonstration as a sequence of 2 twirling skills, followed by a single bending skill. Again, note the ability of the policies to reconstruct the overall structure of the demonstration well, while still accurately capturing the various joint motions of the character.
Original Demonstration
Reconstructed Rollout
Trajectory 3: This trajectory consists of a sitting down motion, a standing up motion, and finally a motion that reaches towards a door handle. The reconstructed rollout correspondingly consists of a sitting skill, a standing skill, and finally a reaching upwards with the left hand while standing. As mentioned above, the disparity in absolute positions between the original and reconstructed trajectories is due to our models predicting in relative joint coordinates / velocities, which may easily be resolved by either predicting in global coordinates.
Original Demonstration
Reconstructed Rollout
Our initial experiments on temporal variational inference were performed on 2-D toy trajectories. These trajectories were generated by sequencing 4 options - going 3-5 steps in one of the 4 cardinal directions, and then selecting another option, etc. Below, we present the learnt embedded latent space from this data (as computed for the embeddings on other datasets), along with pairs of original demonstrations and their corresponding reconstructed rollouts.
Trajectories are visualized as a sequence of dots, going from blue (start) to red (end). In the embedding of the learnt latent space (left), we observe clusters of options corresponding to "left", "right", "up", and "down" moving options that are well separated in the latent space. Further, we also observe additional options such as left and right turns that emerge. While these options were not used to generated the dataset, we do expect to see these options emerge from our model, and they are useful to solving downstream navigation tasks and reconstructing demosntrations.
We also depict 3 instances of the reconstruction of demonstrations by rolling out policies (right). Here, we observe that the reconstructed trajectories (below) closely follow the option pattern of the original demonstration (above) across all three instances, when rolled out from the blue start state. This is clear indication that our model is able to capture the overall structure of demonstrations in terms of its constituent options.