We propose an end-to-end pose estimation and grasping network, which takes RGB-D images of the current and desired configurations of an object to be grasped. Taking into account the final pose and various kinematic constraints associated with the robot, the goal of the network would be to learn functional grasps that are reachable and also facilitate the rearrangement of the object to the final pose. This will involve the exploration of a grasp evaluation metric that considers object immobilization after grasping, reachability and functional use for rearrangement. See the sub-page to learn more.
The intuitive reasoning behind being able to estimate the pose of an object, determine the grasp in order to facilitate the intent of using the grasped object in some way is to allow manipulators perform tasks that humans perform on a daily basis. As humans, tasks such as holding a ladle for stirring or grasping a box to place in the shelf are performed subconsciously and effortlessly. Our brains process the current position and orientation of the object and map it to the perceived position and orientation of the goal state and grasp the object accordingly. To be able to teach a robot to perform such general tasks would lead to the advancement of using robots in our homes for doing random yet repeatable tasks. See the sub-page to learn more.
As proposed, our network would estimate the pose of an object as seen in the frame of an RGB-D camera and compute the best grasp such that the functional intent of grasping is preserved and demonstrate this with a robotic manipulator. This builds on top of the work presented in [4] which demonstrates an element of end-to-end learning, but extended such that the target pose is also considered in grasp evaluation. See the sub-page to learn more.