Facial expression recognition

During my Ph.D study, I primarily have focused on the feature extraction for facial expression recognition, and attempted to resolve the problems caused by illumination variation (e.g., in the room with poor illumination), partial occlusion (e.g., you wear the sunglasses) and varied view (e.g., you can free change your head pose). The main technologies have appeared in my doctoral thesis, don't hesitate to read it.

(1) Spatiotemporal Feature Descriptor

Feature representation is an important research topic on facial expression recognition in video sequences. We propose to use spatiotemporal monogenic binary patterns [1] to describe the appearance and motion information of the dynamic sequences. Firstly, we use monogenic signals analysis to extract the magnitude, the real picture and the imaginary picture of the orientation of each frame, since the magnitude can provide much appearance information and the orientation can provide complementary information. Secondly, the phase-quadrant encoding method and the local bit exclusive operator are utilized to encode the real and imaginary pictures from orientation in three orthogonal planes, and the local binary pattern operator is used to capture the texture and motion information from the magnitude through three orthogonal planes. Finally, both the concatenation method and multiple kernel learning method are exploited to handle the feature fusion. The experimental results on the Extended Cohn-Kanade and Oulu-CASIA facial expression databases demonstrate that the proposed methods perform better than the state-of-the-art methods, and are robust to illumination variations.

Fig 1.1.1. Procedure of STLMBP

STLMBP was verified to work promisingly in various illumination conditions. We further proposed an improved spatiotemporal feature descriptor based on STLMBP . The improved descriptor uses not only magnitude and orientation, but also the phase information, which provide complementary information. STLMBP and Improved STLMBP are evaluated in the Acted Facial Expression in the wild.

Fig. 1.1.2 Procedure of Improved STLMBP

Reference:

[1] Xiaohua Huang, Guoying Zhao, Matti Pietikäinen and Wenming Zheng. Spatiotemporal local monogenic binary patterns for facial expression recognition. Signal Processing Letters, Vol. 19, No. 3, pp. 243-246, 2012.

[2] Xiaohua Huang, Qiuhai He, Xiaopeng Hong, Guoying Zhao and Matti Pietikäinen. Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. Proceedings of 16th ACM International Conference on Multimodal Interaction, pp. 514-520, 2014.


(2) Resolve the problem by partial Occlusion for facial expression recognition

Facial occlusion is a challenging research topic in facial expression recognition (FER). It leads us to develop some interesting facial representations and occlusion detection methods in order to extend FER to uncontrolled environments. It should be noted that most of previous work is focused on these two issues separately, and on static images. We are thus motivated to propose a complete system consisting of facial representations, occlusion detection, and multiple feature fusion in video sequences. For achieving a robust facial representation due to the contributions of facial components to expressions, we propose an approach deriving six feature vectors from eyes, nose and mouth components to form a facial representation. These features with temporal cues are generated by dynamic texture and structural shape feature descriptors. On the other hand, occlusion detection is realized by the traditional classifiers or model comparison. Recently, sparse representation has been proposed as an efficient method against occlusion, while it is correlated with facial identity in FER, unless using an appropriate facial representation. Thus, we present an evaluation that demonstrates that the proposed facial representation is independent of facial identity. We then exploit sparse representation and residual statistics to occlusion detection of the image sequences. As concatenating six feature vectors into one causes the curse of dimensionality, we propose multiple feature fusion consisting of fusion module and weight learning. Experimental results on the Extended Cohn-Kanade database and simulated database demonstrate our framework outperforms the state-of-the-art methods for FER in normal videos, and especially, in partial occlusion videos. The idea and results were published in Pattern Recognition Letter.

Fig. 1.2 The proposed method of dynamic expression recognition against facial occlusion. (a) The procedure of the component-based facial expression representation. (b) An example of occlusion detection in eyes region.

Reference:

Xiaohua Huang, Guoying Zhao, Wenming Zheng and Matti Pietikäinen. Towards a dynamic expression recognition system under facial occlusion. Pattern Recognition Letters, Vol. 33, No. 16, pp. 2181-2191, 2012.


(3) Multi-view Facial Expression Recognition

 Facial expression recognition (FER) has been predominantly utilized to analyze the emotional status of human beings. In practice nearly frontal-view facial images may not be available. Therefore, a desirable property of FER would allow the user to have any head pose. Some methods on non-frontal-view facial images were recently proposed to recognize the facial expression by building discriminative subspace in specific views. These approaches ignore (1) the discrimination of inter-class samples with the same view label and (2) the closeness of intra-class samples with all view labels. We proposed a new method to recognize arbitrary-view facial expressions by using discriminative neighborhood preserving embedding and multi-view concepts. It first captures the discriminative property of inter-class samples. In addition, it explores the closeness of intra-class samples with arbitrary view in a low-dimensional subspace. Experimental results on BU-3DFE and Multi-PIE databases show that our approach achieves promising results for recognizing facial expressions with arbitrary views. 

Fig. 1.3 Illustration of multi-view discriminative neighborhood preserving embedding for arbitrary-view FER