Micro-expressions are provoked involuntary and spontaneously. In other words, it is uncontrollable, and thus being able to reveal a person’s concealed genuine feelings. Facial micro-expression analysis has attracted much attention from the computer vision and psychology communities due to its viability in a broad range of applications, including medical diagnosis, police interrogation, national security, business negotiation, and social interactions. However, the micro and subtle occurrence that appears on the face poses a major challenge to the development of an efficient automated micro-expression recognition system.
This Computational Analytics & Cognitive Vision (CACV) Lab tends to discover various types of feature extractors that can effectively describe the local textural and motion features on the face. Experiments will be carried out on recent spontaneous micro-expression databases. Besides, thorough analysis for each proposed method will be included to highlight their contributions and the limitations.
CASME II [1]
sub01_EP19_05fSMIC [2]
s09_s09_sur_03SAMM [3]
016_016_7_8SAMM [3]
020_020_4_1[2024] SFAMNet: A scene flow attention-based micro-expression network
In this paper, we propose the first Scene Flow Attention-based Micro-expression Network, namely SFAMNet. It takes the scene flow computed using the RGB-D flow algorithm as the input and predicts the spotting confidence score and emotion labels. Specifically, SFAMNet is an attention-based end-to-end multi-stream multi-task network devised to spot and recognize the ME. Besides that, we present a data augmentation strategy to alleviate the small sample size problem during network learning. Extensive experiments are performed on three tasks: (i) ME spotting; (ii) ME recognition; and (iii) ME analysis on the multi-modal CAS(ME) dataset.
This paper aims to give a strong impetus to facilitating the advancement of ME recognition systems, particularly in in-the-wild applications. Succinctly, an efficient recognition system is introduced herein, which incorporates the core processes like 3D facial reconstruction, apex spotting, and emotion recognition tasks. Concretely, all faces are first rendered into 3D point cloud form for the brevity of face alignment along the video. Then, the optical flow-guided components are employed as the primary features to represent the motion details. Subsequently, a 3-Stream channel based on 3-Dimensional faces Network (3T3D-Net) with skip connection via multiplication is tailored to cope with the small-size input image. As a result, pleasing recognition performance is yielded via the suite of novel techniques devised, by producing a UAR of 69.44% and UF1 of 70.71%, when evaluated on the in-the-wild ME dataset, viz, MEVIEW.
In this study, a Feature Elimination through Data Complexity-based Error-Correcting Output Codes (FEDC-ECOC) algorithm is proposed. In the generation of the coding matrix, a set of data complexity measures are utilized as the division criteria to form a coding matrix. Meanwhile, the sliding window and the greedy search algorithm are applied to improve the discriminative ability of the coding matrix for various emotion types. On the other hand, this study proposes a feature selection algorithm to identify essential features to enhance the performance of classifiers. Comprehensive experiments are conducted, and the results confirm the robustness and effectiveness of our FEDC-ECOC. Detailed analysis is given to further provide insights of the proposed method.
[2023] The design of error-correcting output codes based deep forest for the micro-expression recognition
This paper proposes a novel deep forest (DF) based on error-correcting output codes (ECOC), named EDF. The ECOC is innovatively deployed to encourage diversity and help to summarize differences among classes from multiple perspectives. Compared to DNN-based models, EDF is a more general solution for the MER since it is not sensitive to the size of data sets and needn’t other approaches such as transfer learning. On the other hand, EDF copies with the interpretability-accuracy trade-off to a certain extent.
[2022] LAENet for micro-expression recognition
This paper introduces a lightweight apex-based enhanced network that improves by extending one of the state-of-the-art, shallow triple stream three-dimensional CNN. Concretely, the network is first pre-trained with a macro-expression dataset to encounter the small data problem. The features were extracted from CASME II, SMIC, and SAMM datasets. Moreover, thorough recognition results comparison of the datasets are the optical flow-guided features.Besides, an eye masking technique is introduced to reduce noise interference such as eye blinking and glasses reflection issues. The results obtained have an accuracy of 79.19% and an F1-score of 75.9%.
[2022] The heterogeneous ensemble of deep forest and deep neural networks for micro‑expressions recognition
In this study, a new ensemble algorithm is proposed by fusing two different deep learning frameworks: Deep Forest (DF) and Convolutional Neural Networks (CNN) (DFN for short). A modified DF structure is deployed to extract features through the multi-grained scanning technique, along with three different sliding windows to gain diverse motion features. In addition, two shallow CNNs are deployed to extract the features from the three-dimensional optical flow vector and the apex frame. In this way, the fusion of DF and CNNs forms DFN to extract the static and dynamic features for MEs, to generate diverse features with high-level abstraction. Consequently, this heterogeneous ensemble deploys the high diversity in these two models to promote the overall discriminative ability.
This paper considers the challenge of spotting facial macro- and micro-expression from long videos.We propose the multi-temporal stream network (MTSN) model that takes two distinct inputs by considering the different temporal information in the facial movement. We also introduce a hard and soft pseudo-labeling technique to enable the network to distinguish expression frames from nonexpression frames via the learning of salient features in the expression peak frame. Consequently, we demonstrate how a single output from the MTSN model can be post-processed to predict both macro- and micro-expression intervals. Our results outperform the MEGC 2022 baseline method significantly by achieving an overall F1-score of 0.2586 and also did remarkably well on the MEGC 2021 benchmark with an overall F1-score of 0.3620 and 0.2867 on CAS(ME)2 and SAMM Long Videos, respectively.
[2022] Needle in a Haystack: Spotting and recognising micro-expressions ‘‘ in the wild”
This paper introduces, for the first time, a completely automatic micro-expression ‘‘spot-and-recognize” framework that is performed on in-the-wild videos, such as in poker games and political interviews. The proposed method first spots the apex frame from a video by handling head movements and unconscious actions which are typically larger in motion intensity, with alignment employed to enforce a canonical face pose. Optical flow guided features play a central role in our method: they can robustly identify the location of the apex frame, and are used to learn a shallow neural network model for emotion classification. Experimental results demonstrate the feasibility of the proposed methodology, establishing good baselines for both spotting and recognition tasks – ASR of 0.33 and F1-score of 0.6758 respectively on the MEVIEW micro-expression database.
[2021] Micro-expression recognition using advanced genetic algorithm
This paper attempts to improve the performance of current micro-expression recognition systems by introducing an efficient and effective algorithm. Particularly, we employ genetic algorithms (GA) to discover an optimal solution in order to facilitate the computational process in producing better recognition results. Prior to the GA implementation, the benchmark preprocessing method and feature extractors are directly adopted herein. Succinctly, the complete proposed framework composes three main steps: the apex frame acquisition, optical flow approximation, and feature extraction with CNN architecture. Experiments are conducted on the composite dataset that is made up of three publicly available databases, viz, CASME II, SMIC, and SAMM. The recognition performance tends to prevail the state-of-the-art methods by attaining an accuracy of 85.9% and F1-score of 83.7%.
[2020] Evaluation of the Spatio-Temporal features and GAN for Micro-expression Recognition System
This paper analyzes several spatio-temporal features and investigates on two generative adversarial networks (GANs) for micro-expression recognition system. First, five optical ow variations are computed (i.e., RLOF, TVL1, Farneback, Lucas Kanade and Horn & Schunck). A thorough comparison among these optical ow methods are reported, where we x all the experiment configurations to observe the effectiveness of each individual method. Secondly, to deal with the small data size issue and the imbalance of emotion class distribution issues, AC-GAN and SAGAN are employed to generate more artificial micro-expression images. Experiments demonstrated that the implementation of GAN enhanced the recognition rate, when evaluated on the SMIC dataset. Thirdly, a slight modification on a state-of-the-art CNN architecture is made.
[2019] A Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition System, FG
A two-layer neural network, namely Shallow Triple Stream Three-dimensional CNN (STSTNet) is proposed. The network is capable to learn the features from three optical flow features (i.e., optical strain, horizontal and vertical optical flow images) computed from the onset and apex frames from each video. Our experimental results demonstrate the viability of the proposed STSTNet, which exhibits the UAR recognition results of 76.05%, 70.13%, 86.86% and 68.10% in composite, SMIC, CASME II and SAMM databases, respectively.
[2019] OFF-ApexNet on Micro-expression Recognition System, SPIC
Optical Flow Features from Apex frame Network (OFF- ApexNet) is introduced to recognize the micro-expressions. It combines both the optical flow derived components and CNN features. First, the optical flow features are computed from onset and apex frames. Then, the features are fed into a CNN to further highlight significant expression information. Promising performance results are obtained, which tested on 3 databases, (i.e., SMIC, CASME II and SAMM). Note that this is the first attempt for cross-dataset validation on three databases in this domain. A three-class classification ac- curacy of 74.60% was achieved with its F-measure of 0.71.
[2018] Less is more: Micro-expression recognition from video using apex frame, SPIC
A novel proposition is presented in this paper, whereby we utilize only two images per video, namely, the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on CAS(ME)2, CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS databases. Our proposed technique achieving a state-of-the-art F1-score recognition performance of 0.61 in CASME II and 0.62 in SMIC-HS.
[2018] Bi-directional Vectors from Apex in CNN for Microexpression Recognition, ICIVC
This paper proposed a novel framework, BIVACNN to automatically recognize the micro-expressions. The proposed method extracts optical flow details from the apex frame frame and reconstruct the features using a CNN architecture. It is demonstrated that BiVACNN is capable to attain a promising recognition accuracy of around 80%, in predicting three expression classes (i.e., positive, negative and surprise).