Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight

Abstract: Machine learning techniques have enabled robots to learn complex tasks and perform many simple skills with a wide variety of objects. However, learning a model that can both perform complex tasks and generalize to previously unseen objects and goals remains a significant challenge. To consider the challenge, we study the capability of improvisational tool use: a robot faced with novel objects needs to figure out how to accomplish a new goal that demands using objects as tools. We approach this case study by aiming to train a model with both a visual and physical understanding of multi-object interactions, and develop a sampling-based planner that can leverage these interactions to accomplish tasks. We do so by combining diverse demonstration data with self-supervised interaction data, aiming to leverage the interaction data to build generalizable models and the demonstration data to solve more complex tasks. Our experiments show that our approach can solve a variety of complex tool use tasks from raw pixel inputs, outperforming both imitation learning and self-supervised learning individually. Further, we find that the robot can perceive and use novel objects as tools, including objects that are not are not conventional tools, while also accomplishing tasks more efficiently without tools when they are not required.

tool_use_hd.mp4