Translating Agent-Environment Interactions across Humans and Robots

We present results related to our paper "Translating Agent-Environment Interactions across Humans and Robots" below. In particular, this site serves to visualize dynamic results, including GIFs and videos, that may not be effectively viewed in the paper. We recommend viewing this page in Mozilla Firefox - some images may not load in Google Chrome.

Authors:

Tanmay Shankar

Carnegie Mellon University

Chaitanya Chawla

Technical University of Munich

Almutwakel Hassan

Carnegie Mellon University

Jean Oh

Carnegie Mellon University

Video Overview of Results

Demonstrating Human to Robot Compositional Task Transfer

Compositional Box Opening + Pouring Task

Compositional Drawer Opening + Placing Task

Compositional Pouring + Stirring Task

Demonstrating Real World Individual Instances of Interaction Abstractions

Visualizing Environmental Abstraction Spaces

Visualizing Interaction Abstraction Spaces

Visualizing Individual Instances of Environmental Abstractions

Visualizing Individual Instances of Interaction Abstractions

Video Overview of Results

Demonstrating Human to Robot Compositional Task Transfer

We first present results from our human to robot transfer experiments, for each of the 3 compositional tasks in our compositional human dataset. For each task, we present an instance of a human demonstration of the compositional task, and their respective translations to the robot for two different initial object configurations. These are a dynamic visualizations human and robot interactions; equivalent visualizations are presented in our paper in Figure 5.

Compositional Box Opening + Pouring Task

The first task we show human to robot transfer results on is that of BoxOpening + Pouring. In order to perform this task, the person first reaches to the box, grasps the box lid, opens the box, pushing the lid back, releasing the box, reaching towards the cup, grasps the cup, lifts the cup, transports the cup over the box, tilting the cup to pour beads from it into the box, before finally returning the cup. This is a complex sequence of interactions that need to be translated to the robot. TransAct is able to successfully translate this sequence of interactions to the robot, and perform the task given different relative configurations of the objects.

Query Human Demonstration for BoxOpening+Pouring Task

Translated Robot Trajectory #1

Translated Robot Trajectory #2

Compositional Drawer Opening + Placing Task

The next task we show Human to Robot transfer results on is that of DrawerOpening+PickPlace. In order to solve this task, the person first reaches towards the drawer, grasps the edge of the drawer, slides the drawer open, releases the drawer, reaches towards the cup, grasps the cup, lifts the cup, transports the cup close to the drawer, places the cup in the drawer, and finally returns. TransAct is also able to successfully translate these interactions to the robot, despite different initial configurations of the drawer and the cup.

Query Human Demonstration for DrawerOpening+PickPlace Task

Translated Robot Trajectory #1

Translated Robot Trajectory #2

Compositional Pouring + Stirring Task

The final task we show Human to Robot transfer on is Pouring+Stirring, where a person first reaches to a cup, grasps a cup, lifts the cup, transports the cup over the container, tilt the cup over the container to pour beads into the container, placing cup, reaching to the stirrer, grasping the stirrer, and stirring the beads in the container, before finally placing the stirrer and returning. TransAct is able to translate this complex sequence of interactions in a zero-shot fashion to the robot, which is able to successfully pour beads into a larger cup, then stir the beads in the cup successfully.

Query Human Demonstration for Pouring+Stirring Task

Translated Robot Trajectory #1

Translated Robot Trajectory #2

Demonstrating Real World Individual Instances of Interaction Abstractions

We present a dynamic visualization of individual instances of the learnt instance abstractions. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction.

Real World Individual Instances of Interaction Abstractions (Speed 2x):

Box-Opening Instance

Drawer-Opening Instance

Picking Instance

Placing Instance

Stirring Instance

Pouring Instance

Real World Interaction Abstraction Space (High Resolution):

In the following window :- we present a dynamic visualization of the embedded latent space of our interaction abstractions, as depicted in Figure 2 (left column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.

Press play on the bottom left corner of the video to play. We recommend zooming in to this webpage to view individual learnt abstractions in video. Scrolling across the webpage when zoomed in is also useful.

Real World Interaction Abstraction Space (Low Resolution):

Real World Full Interaction Abstractions (Speed 4x):

Pick-Place Task:

Ground Truth Trajectory

Reconstructed Trajectory

Stirring Task:

Ground Truth Trajectory

Reconstructed Trajectory

Pouring Task:

Ground Truth Trajectory

Reconstructed Trajectory

Box-Opening Task:

Ground Truth Trajectory

Reconstructed Trajectory

Drawer-Opening Task:

Ground Truth Trajectory

Reconstructed Trajectory

Real World Combinatorial Generalization (Speed 4x):

The next result we present is the Combinatorial Generalization of learnt interaction abstractions in the real world. We tested out combinations of tasks on the robot, where the system had not seen these combinations during training time. On the left, is the combination of Pouring and Stirring, and on the right, the combination of Box Opening and Pouring.

Pouring + Stirring Box Opening + Pouring

Visualizing Environmental Abstraction Spaces

The second result we present is a dynamic visualization of the embedded latent space of our environmental abstractions, as depicted in Figure 2 (left column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.

RoboMimic Dataset Environmental Abstraction Space:

Visualizing Interaction Abstraction Spaces

The next result is a dynamic visualization of the embedded latent space of our interaction abstractions, as depicted in Figure 2 (right column) of our paper. For each dataset, each frame in the following video corresponds to the policy being rolled out with a different latent abstraction, and is positioned at the corresponding embedded location of its latent variable in the embedded space.

RoboMimic Dataset Interaction Abstraction Space:

Visualizing Individual Instances of Environmental Abstractions

The following result we present is a dynamic visualization of individual instances of the learnt environmental abstractions; equivalent static visualizations are presented in Figure 3 in our paper. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction.

Roboturk Dataset Environmental Abstractions:

Nut and Peg Assembly Task:

Ground Truth Trajectory

Reconstructed Trajectory

Bin Picking Task:

Ground Truth Trajectory

Reconstructed Trajectory

RoboMimic Dataset Environmental Abstractions:

Nut and Peg Assembly Task:

Ground Truth Trajectory

Reconstructed Trajectory

Bin Picking Task:

Ground Truth Trajectory

Reconstructed Trajectory

Visualizing Individual Instances of Interaction Abstractions

The final result is a dynamic visualization of individual instances of the learnt interaction abstractions; equivalent static visualizations are presented in Figure 4 in our paper. Each frame in the following GIFs corresponds to the policy being rolled out with a different latent abstraction. In each row, we present a ground truth trajectory on the left and its corresponding reconstruction via a learnt abstraction on the right.

Note that the grippers of these robots in the visualizations do not work due to some issues we were unable to resolve by the workshop deadline - in these images they are meant to be closed.

Roboturk Dataset Interaction Abstractions:

Nut and Peg Assembly Task: