We present results related to our paper "Translating Agent-Environment Interactions across Humans and Robots" below. In particular, this site serves to visualize dynamic results, including GIFs and videos, that may not be effectively viewed in the paper. We recommend viewing this page in Mozilla Firefox - some images may not load in Google Chrome.
Authors:
Technical University of Munich
Almutwakel Hassan
Carnegie Mellon University
Carnegie Mellon University
We first present results from our human to robot transfer experiments, for each of the 3 compositional tasks in our compositional human dataset. For each task, we present an instance of a human demonstration of the compositional task, and their respective translations to the robot for two different initial object configurations. These are a dynamic visualizations human and robot interactions; equivalent visualizations are presented in our paper in Figure 5.
The first task we show human to robot transfer results on is that of BoxOpening + Pouring. In order to perform this task, the person first reaches to the box, grasps the box lid, opens the box, pushing the lid back, releasing the box, reaching towards the cup, grasps the cup, lifts the cup, transports the cup over the box, tilting the cup to pour beads from it into the box, before finally returning the cup. This is a complex sequence of interactions that need to be translated to the robot. TransAct is able to successfully translate this sequence of interactions to the robot, and perform the task given different relative configurations of the objects.
Query Human Demonstration for BoxOpening+Pouring Task
Translated Robot Trajectory #1
Translated Robot Trajectory #2
The next task we show Human to Robot transfer results on is that of DrawerOpening+PickPlace. In order to solve this task, the person first reaches towards the drawer, grasps the edge of the drawer, slides the drawer open, releases the drawer, reaches towards the cup, grasps the cup, lifts the cup, transports the cup close to the drawer, places the cup in the drawer, and finally returns. TransAct is also able to successfully translate these interactions to the robot, despite different initial configurations of the drawer and the cup.
Query Human Demonstration for DrawerOpening+PickPlace Task
Translated Robot Trajectory #1
Translated Robot Trajectory #2
The final task we show Human to Robot transfer on is Pouring+Stirring, where a person first reaches to a cup, grasps a cup, lifts the cup, transports the cup over the container, tilt the cup over the container to pour beads into the container, placing cup, reaching to the stirrer, grasping the stirrer, and stirring the beads in the container, before finally placing the stirrer and returning. TransAct is able to translate this complex sequence of interactions in a zero-shot fashion to the robot, which is able to successfully pour beads into a larger cup, then stir the beads in the cup successfully.
Query Human Demonstration for Pouring+Stirring Task
Translated Robot Trajectory #1
Translated Robot Trajectory #2
We present a dynamic visualization of individual instances of the learnt instance abstractions. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction.
Real World Individual Instances of Interaction Abstractions (Speed 2x):
Box-Opening Instance
Drawer-Opening Instance
Picking Instance
Placing Instance
Stirring Instance
Pouring Instance
Real World Interaction Abstraction Space (High Resolution):
In the following window :- we present a dynamic visualization of the embedded latent space of our interaction abstractions, as depicted in Figure 2 (left column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt abstractions in video. Scrolling across the webpage when zoomed in is also useful.
Real World Interaction Abstraction Space (Low Resolution):
Real World Full Interaction Abstractions (Speed 4x):
Pick-Place Task:
Ground Truth Trajectory
Reconstructed Trajectory
Stirring Task:
Ground Truth Trajectory
Reconstructed Trajectory
Pouring Task:
Ground Truth Trajectory
Reconstructed Trajectory
Box-Opening Task:
Ground Truth Trajectory
Reconstructed Trajectory
Drawer-Opening Task:
Ground Truth Trajectory
Reconstructed Trajectory
Real World Combinatorial Generalization (Speed 4x):
The next result we present is the Combinatorial Generalization of learnt interaction abstractions in the real world. We tested out combinations of tasks on the robot, where the system had not seen these combinations during training time. On the left, is the combination of Pouring and Stirring, and on the right, the combination of Box Opening and Pouring.
Pouring + Stirring Box Opening + Pouring
The second result we present is a dynamic visualization of the embedded latent space of our environmental abstractions, as depicted in Figure 2 (left column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt abstractions in video. Scrolling across the webpage when zoomed in is also useful.
Roboturk Dataset Environmental Abstraction Space:
RoboMimic Dataset Environmental Abstraction Space:
The next result is a dynamic visualization of the embedded latent space of our interaction abstractions, as depicted in Figure 2 (right column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.
Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt abstractions in video. Scrolling across the webpage when zoomed in is also useful.
Roboturk Dataset Interaction Abstraction Space:
RoboMimic Dataset Interaction Abstraction Space:
The following result we present is a dynamic visualization of individual instances of the learnt environmental abstractions; equivalent static visualizations are presented in Figure 3 in our paper. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction.
Roboturk Dataset Environmental Abstractions:
Nut and Peg Assembly Task:
Ground Truth Trajectory
Reconstructed Trajectory
Bin Picking Task:
Ground Truth Trajectory
Reconstructed Trajectory
RoboMimic Dataset Environmental Abstractions:
Nut and Peg Assembly Task:
Ground Truth Trajectory
Reconstructed Trajectory
Bin Picking Task:
Ground Truth Trajectory
Reconstructed Trajectory
The final result is a dynamic visualization of individual instances of the learnt interaction abstractions; equivalent static visualizations are presented in Figure 4 in our paper. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction. In each row, we present a ground truth trajectory on the left and its corresponding reconstruction via a learnt abstraction on the right.
Note that the grippers of these robots in the visualizations do not work due to some issues we were unable to resolve by the workshop deadline - in these images they are meant to be closed.
Roboturk Dataset Interaction Abstractions:
Nut and Peg Assembly Task:
Ground Truth Trajectory
Reconstructed Trajectory
Bin Picking Task:
Ground Truth Trajectory
Reconstructed Trajectory
RoboMimic Dataset Interaction Abstractions:
Nut and Peg Assembly Task:
Ground Truth Trajectory
Reconstructed Trajectory
Bin Picking Task:
Ground Truth Trajectory
Reconstructed Trajectory