Project Homepage of SP-EMD for Hand Gesture Recognition with Kinect

Homepage

Superpixel Earth Mover's Distance for Hand Gesture Recognition

Abstract

We present a new superpixel-based hand gesture recognition system based on a novel superpixel earth mover’s distance metric, together with Kinect depth camera. The depth and skeleton information from Kinect are effectively utilized to produce markerless hand extraction. The hand shapes, corresponding textures and depths are represented in the form of superpixels, which effectively retain the overall shapes and color of the gestures to be recognized. Based on this representation, a novel distance metric, Superpixel Earth Mover’s Distance (SP-EMD), is proposed to measure the dissimilarity between the hand gestures. This measurement is not only robust to distortion and articulation, but also invariant to scaling, translation and rotation with proper preprocessing.

The effectiveness of the proposed distance metric and recognition algorithm are illustrated by extensive experiments with our own gesture dataset as well as two other public datasets. Simulation results show that the proposed system is able to achieve high mean accuracy and fast recognition speed. Its superiority is further demonstrated by comparisons with other conventional techniques and two real-life applications.

Demo Video of Real-Life Applications

(Rock-Paper-Scissors-Lizard-Spock Game and 3D Content Browser)

(Robotic hand manipulation and 3D scene navigation)

Experimental Results

Our system is evaluated using three different real world datasets, namely our joint color-depth hand gesture dataset, NTU hand digit dataset and American Sign Language (ASL) finger spelling dataset.

Our Joint Color-Depth Hand Gesture Dataset

It contains 10 gestures with 20 different poses from 5 subjects. Therefore, there are a total of 1,000 cases for testing, each of which consists of a pair of color texture and depth map with corresponding skeleton information used in our experiment. Gesture samples are shown below, which are labeled from 0 to 9. It should be noted that this dataset is a challenging real-life dataset, which is collected in two different rooms with different illumination conditions using different Kinects. Moreover, the hand motion is not very restrictive including large in-plane rotation and moderate out-of-plane rotation.

Download Our Dataset:

(including the samples for view angle sensitivity test)

The confusion matrix of hand gesture recognition using SP-EMD (unit: %). Left: LOO CV. Right: L4O CV.

The mean accuracy and mean running time of FEMD, Shape Context, Skeleton Matching and our proposed SP-EMD

NTU Hand Digit Dataset

Their Homepage and Download the Dataset

The confusion matrix of hand gesture recognition using SP-EMD (unit: %) for LOO CV.

Comparison with other state-of-the-art recognition algorithms.

ASL Finger Spelling Dataset

Their Homepage and Download the Dataset

The confusion matrix of hand gesture recognition using SP-EMD (unit: %) for LOO CV.

Comparison with other state-of-the-art recognition algorithms.

Sensitivity Analysis

Our algorithm is robust to parameter selection, rotation, scaling and view angle changes.

Parameter Sensitivity Test

Superpixel Size Weights

Orientation and Scale Sensitivity Test

Synthetic mismatches are added to corrupt the preprocessed data of our hand gesture dataset before the ICP alignment is applied. More specifically, after preprocessing the hand shapes with scale normalization, skeleton-based in-plane rotation correction and depth-based out-of-plane rotation correction, they are randomly rotated with a degree theta or scaled by a factor of (1+delta). In our experiments, theta and delta are generated using a Gaussian distribution with zero mean and a standard deviation of sigma. Five different values of sigma are tested and each test is repeated 50 times. The following tables summarize the averaged accuracies with orientation and scale noise, respectively.

Mean accuracy of SP-EMD with orientation noise

Mean accuracy of SP-EMD with scale noise

View Angle Sensitivity Test

The test samples are captured from 5 different view angles ( 0, -20, -10, +10 and +20 degrees) with 5 subjects.

The confusion matrix of hand gesture recognition (5 view angles) using SP-EMD (unit: %). Left: LOO CV. Right: L4O CV.

The confusion matrix of hand gesture recognition (5 view angles) using SP-EMD (unit: %) without the preprocessing step of out-of-plane (OOP) rotation correction. The recognition accuracy will degrade quite a bit (2.33% drop in L4O CV and 0.67% drop in LOO CV). Left: LOO CV. Right: L4O CV.

Publications

Chong Wang, Zhong Liu and Shing-Chow Chan, "Superpixel-based Hand Gesture Recognition with Kinect Depth Camera," IEEE Trans. Multimedia, vol. 17, no. 1, pp. 29-39, Jan. 2015. (pdf & link)

C. Wang, Z. Liu, M. Zhu, J. Zhao and S.-C. Chan “A Hand Gesture Recognition System based on Canonical Superpixel-Graph,” Signal Processing: Image Communication, vol. 58, pp. 87-98, Oct. 2017. (pdf & link)

C. Wang, Z. Liu and J. Zhao, “Hand gesture recognition based on canonical formed superpixel earth mover's distance”, ICME, Seattle, Jul. 2016. (link)

Chong Wang and Shing-Chow Chan, "A New Hand Gesture Recognition Algorithm based on Joint Color-Depth Superpixel Earth Mover's Distance," in Int. Workshop on Cognitive Information Processing (CIP), Copenhagen, 2014, pp. 1-6. (pdf)

Patent Pending ...

Page updated

Google Sites

Report abuse