Visual Affordance Prediction for Guiding Robot Exploration


Homanga BharadhwajAbhinav GuptaShubham Tulsiani


The Robotics Institute, Carnegie Mellon University

In ICRA 2023

  Paper     |       Video       |         Code

           

Motivated by the intuitive understanding humans have about the space of possible interactions, and the ease with which they can generalize this understanding to previously unseen scenes, we develop an approach for learning `visual affordances'. Given an input image of a scene, we infer a distribution over plausible future states that can be achieved via interactions with it. To allow predicting diverse plausible futures,  we  discretize the space of continuous images with a VQ-VAE and use a Transformer-based model to learn a conditional distribution in the latent embedding space. We show that these models can be trained using large-scale and diverse passive data, and that the learned models exhibit compositional generalization to diverse objects beyond the training distribution. We evaluate the quality and diversity of the generations, and demonstrate how the trained affordance model can be used for guiding exploration during visual goal-conditioned policy learning in robotic manipulation.


 Affordance-driven exploration and policy learning through hindsight goal re-labeling. Given an initial configuration, we sample a goal with the affordance model, execute rollouts with the current policy, and store the transitions in the replay buffer. For updating the policy, we sample transitions from the replay buffer and re-label goals to be the final frames in the corresponding trajectory. 

Exploration visualization

50 ep

150 ep

400 ep

600 ep

900 ep

1000 ep

Episode rollouts with generated goals under each video. From left to right, we show progression of exploration after different training episodes (ep). Towards the end, we see the evolution of interesting behaviors like stacking and grasping

BibTex Entry

@article{BharadhwajVisual,

  Author = {Homanga Bharadhwaj and Abhinav Gupta and Shubham Tulsiani},

  Journal={IEEE International Conference on Robotics and Automation (ICRA)},

  Year = {2023},

  Title = {Visual Affordance Prediction for Guiding Robot Exploration}