As humans, we see a new object and effortlessly pick it up based on a prior understanding of what we want to do with the object. This is especially true in object rearrangement tasks wherein we wish to place an object in a specific target configuration. For instance, we pick up an object in a different way if we want to stack it on top of another object as opposed to arranging it on a shelf or in a narrow space. In addition, most robotics applications that involve pose estimation and grasping, treat these problems in isolation without considering the kinematic constraints of the robot or the requirements of what to do with the object after it has been grasped. The motivation behind our idea is inspired by a larger goal of developing generalized grasping frameworks that can achieve human-level dexterity even with novel objects. As a stepping stone towards this goal, we aim to leverage SE(3) equivariant feature descriptors and state-of-the-art grasping frameworks that can learn the object description and grasping problem in an end-to-end manner. By this, we aim to formulate a grasping metric that is influenced not only by the success of the grasp, but also by the ability of the chosen grasp to facilitate the target pose, and other functional actions.
Ability to pick up objects that are not constrained to a specific object category
Objects can be placed at random initial poses
Objects can be placed at random target poses
With a strong target grasp pose prior for optimization, it is possible to eliminate the need for demonstrations