A manipulation example: tea-serving. In this task, the robot needs to grasp the teapot and pour tea into the cup. The figure shows what the robot needs to do at the beginning of the task - go to the teapot handle. Left: How do human reason about actions when teleoperating the robot to serve tea (human gaze focuses on the object). Right: What the robot should learn to do (1. place a visual keypoint on the right object; 2. reason about 3d location; and 3. compute action accordingly).
Robot needs to lift the cube up to a height.
Robot needs to stack the red cube on top of the green plate.
Robot cannot reach the blue cube in the beginning, so it needs to use the red tool to hook the blue cube closer, then place the blue cube inside the brown hole.