Zero-Shot Visual Generalization in Robot Manipulation
Real-World Rollouts (ALDA-DP)
distracting_everything_cropped.mp4
Our model is robust to dynamic distractors and various lighting conditions.
pickcube_basic_MAIN_clipped_nosound.mp4
Standard
ambient_lighting_clipped_nosound.mp4
Ambient Lighting
small_greencube_clipped.mp4
Green Cube
distracting_objects.mp4
Distracting Objects
What is the Agent Actually Seeing?
ALDA-DP disentangles the latent representations and leverages principles of associative memory to map the continuous outputs of the encoder to a discrete latent representation. This means that, if at test time the agent receives an OOD observation, one or more dimension of the continuous latent may be OOD, and the associative step maps those values back to the closest in-distribution discrete value. This is equivalent to the agent asking itself "what is the most similar image I have seen to this new observation I received?" and taking an action conditioned on that instead.
realsense_video_dbg.mp4
True Observations
decoder_video_dbg.mp4
Decoder Reconstructions of the Disentangled Latent After Association