Sharing communication of autonomous robots with input from a human operator could facilitate data collection of robotic task demonstrations, yet the means to communicate intent and reason about the future are disparate between humans and robots. Recent advancements in natural language processing with Transformers lend both insight and specific tools to tackle this problem. The self-attention mechanism in Transformers aims to holistically understand a sequence of words, rather than emphasizing connections between adjacent words. The same holds when Transformers are applied to robotic task trajectories: given an environment state and task goal, the model can quickly update its plan with new information at every step while maintaining holistic knowledge of the past. A key insight is that human intent can be injected at any location within the time sequence if the user decides that the model predicted actions are inappropriate. At every time step, the user can (1) do nothing and allow autonomous operation to continue while observing the robot’s future plan sequence, or (2) take over and momentarily prescribe a different set of actions to nudge the model back on track and let it continue autonomously from there onwards. Virtual reality (VR) offers an ideal ground to communicate these intents on a robot, and to accumulate knowledge from human demonstrations. We develop Assistive Tele-op, a VR system that allows users to collect robot task demonstrations with both a high success rate and with greater ease than manual teleoperation systems.

In all the videos successes are shown on the left two columns and failures are show on the right column.

Fully autonomous completion - examples

Industrial tasks

Block stacking - place blue on orange.

Round nut assembly - Roboturk task, recreated in Omniverse.


Assembly kit with pink hexagon - from TransporterNets.

Household tasks

Cabinet drawer opening

Put green bowl into the cabinet

Caregiving tasks

Humanoid itch scratching

Humanoid drinking

Policy execution with human interventions

When a human user detects an issue that may lead to inappropriate actions or failure, they momentarily take over in VR to nudge the robot back on track.

Round nut assembly

Assembly kit pink hexagon

Put green bowl into the cabinet

Humanoid itch scratching

Humanoid drinking