HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors
Anonymous Authors
HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors
Anonymous Authors
Abstract
Various heuristic objectives for modeling hand-object interaction have been employed in past work. However, due to the lack of a cohesive framework, these objectives often have a narrow scope of application and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach uses rendering priors to align with input images and segmentation masks, while physics priors mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves high accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, runs faster. We show that HandyPriors achieves comparable or better results in the pose estimation task, and the differentiable physics module enables us to predict contact information for pose refinement. We also show that our approach generalizes to perception tasks including robotic hand manipulation and human-object pose estimation in the wild.
Overview
HandyPriors is a modular block application in multiple tasks. The estimation of hand-object interaction can be achieved by (a) optimization-based refinement or (e) online filtering with Extended Kalman Filter (EKF). Moreover, the differentiable contact module can be used to perform (b) pose refinement given desired contact status. More generally, our differentiable priors can also be used in pose estimation for scenes with (d) human bodies or (c) robot hands.
Optimization Process of the Pose Estimation
Tracking Results
Contact Optimization
Human Bodies
Phosa [1]
w/ Rendering term
w/ Rendering+Physics terms
Phosa [1]
w/ Rendering term
w/ Rendering+Physics terms
Robotic Hands
Reference
[1] Zhang, Jason Y., Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, and Angjoo Kanazawa. "Perceiving 3d human-object spatial arrangements from a single image in the wild." In ECCV 2020.
Bibtex