5th Workshop on

Semantic Policy and Action Representations for Autonomous Robots (SPAR)

September 27, 2021 - Prague, Czech Republic

at IROS 2021

Karthik Desingh

Learning Object-centric Representations for Robot Manipulation Tasks

A crucial question for complex multi-step robotic tasks is how to represent relationships between entities in the world, particularly as they pertain to preconditions for various skills the robot might employ. In goal-directed sequential manipulation tasks with long-horizon planning, it is common to use a state estimator followed by a task and motion planner or other model-based system. A variety of powerful approaches exist for explicitly estimating the state of objects in the world. However, it is challenging to generalize these approaches to an arbitrary collection of objects. In addition, the objects are often in contact in manipulation scenarios, where explicit state estimation struggles from the problem of generalizing to unseen objects.

Fortunately, knowing exact poses of objects may not be necessary for manipulation. End-to-end methods leverage that fact and build networks that generate actions directly without explicitly representing objects. Nevertheless, these networks are very specific to the tasks they are trained on. For example, it is non-trivial to use a network trained on stacking blocks to unstack blocks.

In this talk, I will talk about our recent work where we take an important step towards a manipulation framework that generalizes few-shot to unseen tasks with unseen objects. Specifically, we propose a neural network that extracts implicit object embeddings directly from raw RGB images. Trained from large amounts of simulated robotic manipulation data, the object-centric embeddings produced by our network can be used to predict spatial relationships between the entities in the scene to inform a task and motion planner with relevant implicit state information toward goal-directed sequential manipulation tasks.

Karthik Desingh works as a Postdoctoral Scholar at the University of Washington (UW) with Professor Dieter Fox. Before joining UW, he received his Ph.D. in Computer Science and Engineering from the University of Michigan working with Professor Chad Jenkins. During his Ph.D. he was closely associated with the Robotics Institute and Michigan AI. He earned his B.E. in Electronics and Communication Engineering at Osmania University, India, and M.S. in Computer Science at IIIT-Hyderabad and Brown University. He researches at the intersection of robotics, computer vision, and machine learning, primarily focusing on providing perceptual capabilities to robots using deep learning and probabilistic techniques to perform goal-directed tasks in unstructured environments.