Scaffolder

Scaffolder on pre-existing benchmark

Sensory Scaffolding Suite

Blind Pick

Blind Locomotion

Blind and Deaf Piano

Blind and Numb Pen Rotation

Blind and Numb Cube Rotation

Noisy Monkey Bars

Wrist Camera Pick-and-Place

Occluded Pick-and-Place

Visual Pen Rotation

Visual Cube Rotation

Blind Pick

A two-fingered robot arm must pick up a randomly initialized block using only proprioception (gripper position) and touch sensing. During training, it can use multiple privileged RGB camera inputs (two static cameras and one wrist camera).

Scaffolder

Scaffolder learns to do spiral search and robustly picks up the block even with perturbations (shown to the left and above).

DreamerV3+BC & Guided Observability

Baselines move erratically and frequently push the block far from the workspace or search in unfeasible areas.

Blind Locomotion

A proprioceptive policy must run and jump over randomly sized and positioned hurdles. During training, privileged RGB images of the agent and its nearby hurdles are given, allowing the agent to see obstacles before bumping into them.

Scaffolder

Scaffolder bounds to minimize the risk of hitting obstacles and quickly recovers when it does.

Guided Observability

Guided Observability moves quickly in unobstructed areas but is frequently stopped or slowed by hurdles.

DreamerV3

DreamerV3 moves slowly and is easily caught by obstacles.

Blind and Deaf Piano

A pair of 30-DoF Shadowhands must learn to play “Twinkle Twinkle Little Star” using only proprioception in the Robopianist simulator. At training time, the policy has access to future notes, piano key presses, and suggested fingerings, which emulates having vision to see sheet music and hearing to determine which keys were pressed.

asym_pianist.mp4

Scaffolder

Scaffolder plays an imperfect but recognizable rendition of Twinkle Twinkle Little Star

po_pianist.mp4

DreamerV3

DreamerV3's rendition is unrecognizable.

Blind and Numb Pen Rotation

A proprioceptive policy must control a 30-DoF Shadowhand to rotate a pen from a randomized initial orientation to a randomized desired orientation using only proprioception and the initial and target pose of the pen. During training, the policy has access to the object pose and touch sensors.

Scaffolder

Scaffolder quickly achieves the desired pen orientation and maintains stability.

DreamerV3

DreamerV3 briefly achieves the desired pen orientation but struggles to maintain stability.

Blind and Numb Cube Rotation

Similar to Blind and Numb Pen Rotation, a proprioceptive policy must control a 30-DoF Shadowhand to rotate a cube from a randomized initial orientation to a randomized desired orientation.

Scaffolder

Scaffolder quickly rotates the block to achieve the desired orientation and maintains stability once reaching it.

Informed Dreamer

Informed dreamer is able to rotate the block rather quickly but struggles to maintain stability.

Noisy Monkey Bars

A 13-link gibbon must swing between a fixed set of handholds in a 2D environment using the brachiation simulator. To simulate imperfect sensors on a robotic platform, noise is added to the target observations, while privileged observations represent true simulator states.

Scaffolder

Guided Observability

DreamerV3

Informed Dreamer

Both Scaffolder and Guided Observability learn efficient, swinging brachiating motions that closely mimic that of a Gibbon.

DreamerV3 and Informed Dreamer can swing between the handholds but use inefficient motions that incur large energy costs.

Wrist Camera Pick-and-Place

A proprioceptive policy with an active-perception wrist camera must pick and place a randomly positioned block into a randomly positioned bin. During training, the policy has access to multiple informative, statically-placed cameras.

Scaffolder (Wrist Camera View)

Scaffolder (Privileged Camera View)

DreamerV3 (Wrist Camera View)

DreamerV3 (Behind Camera View)

Scaffolder quickly locates the block and goal location with its active-perception wrist camera. DreamerV3 locates the block relatively quickly but struggles to find the goal location.

Occluded Pick-and-Place

Rather than employing active visual perception at test time, this task examines the impact of active-perception as privileged sensing at training time. The target policy must use an RGB camera from an occluded viewpoint, alongside proprioception and touch sensing, to pick up a block behind a shelf and place it into a bin. Both block and bin locations are randomly initialized.

Scaffolder

Scaffolder quickly locates the block with touch sensing and places it on the goal.

DreamerV3 + BC

DreamerV3+BC locates the block but fails to fully pick it up and even attempts to go to the goal location long after dropping it.

Guided Observability

Guided Observability locates the block but fails to fully place it on the goal on its first attempt, as its picking is less robust.

Visual Pen Rotation

Similar to Blind and Numb Pen Rotation, a 30-DoF Shadowhand must rotate a pen from a randomized initial orientation to a randomized desired orientation. The privileged observations are identical to the blind and numb rotation tasks, but the target policy additionally has access to a top-down RGB camera.

Scaffolder

As with blind pen rotation, Scaffolder quickly achieves the desired orientation and maintains stability.

Informed Dreamer

Informed Dreamer initially achieves the desired orientation but struggles to maintain stability.

Visual Cube Rotation

Similar to Visual Pen Rotation, a visual target policy conditioned on a top-down RGB image and proprioception must rotate a block to a desired orientation.

Scaffolder

DreamerV3+BC

Guided Observability

Both Scaffolder and DreamerV3+BC are able to quickly rotate the block, while Guided Observability fails to rotate the block at all.