Vis. Gen. Robotics

Zero-Shot Visual Generalization in Robot Manipulation

Real-World Rollouts (ALDA-DP)

distracting_everything_cropped.mp4

Our model is robust to dynamic distractors and various lighting conditions.

pickcube_basic_MAIN_clipped_nosound.mp4

Standard

ambient_lighting_clipped_nosound.mp4

Ambient Lighting

small_greencube_clipped.mp4

Green Cube

distracting_objects.mp4

Distracting Objects

What is the Agent Actually Seeing?

ALDA-DP disentangles the latent representations and leverages principles of associative memory to map the continuous outputs of the encoder to a discrete latent representation. This means that, if at test time the agent receives an OOD observation, one or more dimension of the continuous latent may be OOD, and the associative step maps those values back to the closest in-distribution discrete value. This is equivalent to the agent asking itself "what is the most similar image I have seen to this new observation I received?" and taking an action conditioned on that instead.

realsense_video_dbg.mp4

True Observations

decoder_video_dbg.mp4

Decoder Reconstructions of the Disentangled Latent After Association

Simulation Rollouts (ALDA-SAC)

lighting.mp4

Lighting

cube_color.mp4

Cube Color

dgb.mp4

Distracting Backgrounds

dbg_color_lighting.mp4

All Randomizations

equivariant.mp4

ALDA-SAC Finetuned for SO(2) Equivariance

Page updated

Google Sites

Report abuse