Facial expression recognition (FER) is a branch of artificial intelligence (AI) that deals with deciphering human emotions by analyzing facial features. It aims to automate what humans do naturally - interpreting emotions from facial expressions.
The roots of FER can be traced back to the field of psychology. Pioneering psychologist Charles Darwin, in his 1872 book "The Expression of the Emotions in Man and Animals," explored the universality of human facial expressions in conveying emotions. This early work laid the groundwork for scientific research into facial expressions and emotions.
Primitive FER systems relied on hand-crafted rules to identify specific facial features, like a furrowed brow for anger or a wide smile for happiness. However, these methods were limited and struggled with variations in lighting, pose, and individual facial characteristics.
The development of FER technology really took off with advancements in computer vision and machine learning in the 20th-21st centuries -- especially with the emergence of deep learning, a subfield of machine learning inspired by the structure and function of the human brain, in the mid-1960s. Deep convolutional neural networks (CNNs) can learn complex patterns from vast amounts of data. By training these CNNs on massive datasets of labeled facial expressions, FER systems achieved significant accuracy improvements.
Feature vector that represents face in different layers of a deep learning network — [1] Mei, Wang, and Weihong Deng. “Deep face recognition: A survey.” arXiv preprint arXiv:1804.06655 1 (2018).
Today's FER systems are remarkably sophisticated. They can not only identify basic emotions like happiness, sadness, or anger, but also recognize more subtle expressions like doubt, concentration, and especially even micro-expressions - fleeting expressions that can reveal underlying emotions.
Now, facial recognition using computer vision is much more cost-effective, versatile and practical than costly electroencephalogram (EEG) and facial electromyography (fEMG) machines requiring bothersome wiring to measure affect; intrusive methods are phasing out and recent advances are only making computer vision more accessible (Bahreini et al., 2019). In 2020, a proposed convolutional neural network CNN method designed to be as computationally light as possible outperformed most competitors when detecting micro-expressions in the face and could do so in 24 ms, making it very suitable for real-time embedded applications with limited memory and computing resources (Belaiche et al., 2020).
Research also suggests a disparity between self-reported feelings and true measured emotion (Franek et al., 2022), solidifying the cold-hard nature of facial data and its lack of bias. Facial data is also easy to gather from any subject and transcends cultural and language barriers that traditional methods fall prey to (Jack et al., 2012; Ekman, 1971).
There exists a strong relationship between music and emotion, and their cumulative effect on the visage is very apparent. Research on audience facial expressions during music performances establishes the link between observed emotions and music experiences, emphasizing the role of facial expressions in gauging affect (Kayser et al., 2021). Another investigation into such expressions during emotionally charged moments and musical experiences suggests a very strong connection between music-induced emotions and specific facial expressions (Klepzig et al., 2022), and repeated exposure to emotionally evocative music is found to induce “liking and smile responses” in people (Witvliet & Vrana, 2007). Additionally, various prototypes of systems using facial expressions to output playlists and recommendations have been proposed in recent years (Mishra et al., 2020; Florence & Uma, 2020; Athalve et al., 2021; Srinivas et al., 2022; Shivam et al., 2023).
In fact, researchers paralyzed facial muscles in people and had them rate how funny they found comics that they were subjected to, and not only found that the paralyzed ones rated the comics less funnier, but also that their neurons genuinely emitted a lower response in the amygdala region of the brain. "Interestingly, using functional neuroimaging with magnetic resonance (fMRI), showed that the deaferentation of frown muscles with BoNT-A selectively attenuates the intensity of contraction of the corrugator muscles, and underactivates amygdalar response when participants are overtly asked to imitate expressions of anger and sadness. Thus the disruption created by deaferentation seems to directly modulate brain structures that help maintain and update the adaptive flow of affective experiences.
“And those that have the Botox paralyze their frown must. An average rate of the same comics as funnier than those who didn't have the paralysis, and that's one. And then the real kicker is that those that have the actual botulinum toxin injections also had less amygdala activation.”