H2O: A Benchmark for Visual Human-human Object Handover Analysis

Object handover is a common human collaboration behavior that attracts attention from researchers in Robotics and Cognitive Science. Though visual perception plays an important role in the object handover task, the whole handover process has been specifically explored. In this work, we propose a novel rich-annotated dataset, H2O, for visual analysis of human-human object handovers. The H2O, which contains 18K video clips involving 15 people who hand over 30 objects to each other, is a multi-purpose benchmark. It can support several vision-based tasks, from which, we specifically provide a baseline method, RGPNet, for a less-explored task named Receiver Grasp Prediction. Extensive experiments show that the RGPNet can produce plausible grasps based on the giver’s hand-object states in the pre-handover phase. Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as video demonstrations for robot imitation learning on the handover task. The dataset, model, and code will be made public

A process of handover typically includes three stages, namely the pre-handover phase, the physical exchange phase, and the post-handover phase.

H2O Handover Dataset (Coming soon)

We adopt all the object models used in ContactPose paper and select some of the YCB objects. Upon submission, we have a total of 30 objects in the repository to be passed over, which are displayed in the image. Though our dataset is self-contained, these datasets can be used as a great source for additional data augmentation.