SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

Xinghao Chen1, Guijin Wang1, Cairong Zhang1, Tae-Kyun Kim2, Xiangyang Ji3

1Department of Electronic Engineering, Tsinghua University, Beijing, China

2Imperial College London, London, UK

3Department of Automation, Tsinghua University, Beijing, China

xinghao.chen@outlook.com; wangguijin@tsinghua.edu.cn

IEEE Access, 2018

Introduction

3D hand pose estimation is an essential problem for human computer interaction. Most of the existing depth-based hand pose estimation methods consume 2D depth map or 3D volume via 2D/3D convolutional neural networks (CNNs). In this paper, we propose a deep Semantic Hand Pose Regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL and MSRA hand pose datasets demonstrate that our SHPRNet achieves high performance on par with start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from multi-view depth data and achieves further improvement on NYU dataset.

Paper

SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

Xinghao Chen, Guijin Wang, Cairong Zhang, Tae-Kyun Kim, Xiangyang Ji

IEEE Access, 2018, Volume: 6, Pages: 43425-43439

[PDF]

Results

Predicted labels: [NYU(frontal view)] [NYU(three views)] [ICVL] [MSRA]

All labels are in the format of (u, v, d) where u and v are pixel coordinates. See [awesome-hand-pose-estimation] for more detailed comparisons.

NYU

Comparison with state-of-the-arts on NYU dataset. Left: the proportion of good frames over different error thresholds. Right: per-joint errors.

ICVL

Comparison with state-of-the-arts on ICVL dataset. Left: per-joint errors. Right: the proportion of good frames over different error thresholds.

MSRA

Comparison with state-of-the-arts on MSRA dataset. Left: per-joint errors. Right: the proportion of good frames over different error thresholds.

Comparison of mean error distance over different yaw (left) and pitch (right) viewpoint angles on MSRA dataset.

Qualitative Results

Qualitative results on NYU, ICVL and MSRA dataset. For each dataset, we visualize and compare the predictions of Pose-REN [8], 3D CNN [9] and our proposed SHPR-Net.

Yet More Results

More results on MSRA14 dataset. Following the protocol of previous work [3-5], we train SHPR-Net on the whole MSRA15 dataset and test on MSRA14 dataset.

These results are not included in our IEEE Access paper, but feel free to cite the results of SHPR-Net.

MSRA14

[1] C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun, “Realtime and robust hand tracking from depth,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1106–1113.

[2] I. Oikonomidis, N. Kyriazis, and A. A. Argyros, “Efficient model-based 3d tracking of hand articulations using kinect.” in British machine vision conference (BMVC), 2011.

[3] L. Ge, H. Liang, J. Yuan, and D. Thalmann, “Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3593–3601.

[4] L. Ge, H. Liang, J. Yuan, and D. Thalmann, “Robust 3d hand pose estimation from single depth images using multi-view cnns,” IEEE Transactions on Image Processing (TIP), vol. 27, no. 9, pp. 4422–4436, 2018.

[5] C. Choi, S. Kim, and K. Ramani, “Learning hand articulations by hallucinating heat distribution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3104–3113.

Citation

@article{chen2018shprnet,

title={SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds},

author={Chen, Xinghao and Wang, Guijin and Zhang, Cairong and Kim, Tae-Kyun and Ji, Xiangyang},

journal={IEEE Access},

volume={6},

pages={43425-43439},

doi={10.1109/ACCESS.2018.2863540},

ISSN={2169-3536}

year={2018}

}