Deception Detection using Facial Action Coding System in Videos

Thesis by Hammad-ud-din Ahmed Khan

Supervised by Dr. Usama Ijaz Bajwa

Facts are important for making decisions in every situation, which is why it is important to catch deceptive information before it can cause any harm. While deceiving, most people unconsciously make facial expressions that last no more than half a second. These are known as micro-expressions and they are encoded under Facial Action Coding System (FACS). Detecting these micro-expressions has proven to be useful in training a system to detect when a subject is deceiving or not. Detection of micro-expressions can range from their individual occurrence during deception to their frequency of occurrence. Using deep neural networks, a system can be designed that learns to detect deception using micro-expressions with the aid of FACS.

Introduction:

Human beings deceive one another in one form another on a regular basis. Several systems and techniques have been designed to detect whether a person is deceiving them or not. Detecting deception is crucial in various fields, such as security in airports, investigations, counter-terrorism, court trials, job interviews, etc. While it can’t be known during an early stage of an investigation whether a claim is real or not, it can however be examined or tested to check whether the statement delivered by a subject is truthful or not. Several techniques have been developed and are widely being used for varying tasks related to deception detection.

The most famous technique is the polygraph which detects and analyzes several physiological processes and variations in them. Polygraph is used for multiple purposes including investigations. Each purpose involves different kinds of input and output and has varying implications. The examiner detects deception based on changes in plots created in response to changes in physiological phenomenon.

One of the most recent techniques in detecting deception is by analyzing facial expressions of a person in a video. There are two kinds of facial expressions namely macro-expressions and micro-expressions. Macro-expressions are obvious to understand like anger, disgust, fear, happiness, surprise, sadness and contempt. They last between half a second to 5 seconds. Micro-expressions are facial expressions that occur unconsciously and display a concealed emotion. They last no more than half a second. These expressions include amusement, embarrassment, anxiety, shame, pleasure, contentment, relief, anxiety and guilt. While we can easily detect and understand macro-expressions, micro-expressions occur so fast that they can easily be ignored and go unnoticed by the untrained eye. They are developed as physiological responses that most people undergo when struggling to conceal some form of emotion or to deceive another person.

Neutralized expressions cannot be observed and masked expression may be or are equivalent of detecting a false expression. Hence, we will be using simulated expression for the project. FACS is a system created by Carl-Herman Hjortsjö and later adopted by Paul Ekman and Wallace Friesen. In FACS, every facial muscle movement is encoded. It contains both macro-expressions and micro-expressions. Action Units (AUs) are the essential actions of distinct or group of facial muscles.

There are several issues faced when decoding what AUs are shown. Most important of them is the quality of video footage available to decode. Quality of a video can be a major factor in the detection of an AU as it will be difficult to decipher whether a certain AU was displayed or not if the footage is hazy or highly pixelated. Sharpening a pixelated video can improve results but might not be enough depending on the case. Another factor in detecting AUs is the facial pose of the subject. The highest amount of accuracy can be achieved if the subject is looking directly at the camera i.e. frontal pose. If a person is looking away from the camera at a 90 degree angle sideways, more than half of the information cannot be decoded. Even the information that can still be decoded might not be accurate.

Datasets:

Several studies have created datasets which they tested on their deception detection systems. Only a few of them are available for others to use. While it is understandable why private data shouldn’t be made public, the importance of sharing datasets to create deception detection systems is they can be used to compare the effectiveness of different techniques and/or making systems that can handle a broader range of input. The impact of a database can be checked based on the stakes of the confessor. Following are the 3 types of stakes that can be encountered in datasets related to deception detection:

Low stakes: The confessor will not be impacted in any way after the recording session is over hence will not be concerned about the outcome of their answers.
Medium stakes: There is a possibility that the confessor will be getting some form of advantage for their answers but not in ways that will alter their lives too significantly e.g., a friendly game in which someone has to guess if the confessor is lying or not.
High stakes: The life of the confessor or someone they know will be significantly altered based on their confessions e.g., a court trial which can lead to serious consequences for their or someone else’s life.

Datasets that were compiled under laboratory conditions usually fall under low or medium stakes. Datasets in which the confessor is under strict or life-altering situations (e.g., court trials) fall under the category of high stakes. For our experiment, we gained access to 3 datasets:

The real-life trial dataset is a collection of real-life trial case videos downloaded from YouTube. The videos were classified as either being truthful or deceitful based on the final verdict given by the judge based on evidence provided and, in some cases, based on exoneration. The dataset also provides in total 39 verbal and non-verbal cues present in each video using the MUMIN coding scheme for both facial and hand movements. This is the most commonly used dataset related to deception detection mostly due to ease of access.
The Silesian deception dataset consists of 101 videos all of which were recorded at 100 FPS in a well-controlled laboratory environment and proper illumination. The videos neither have any audio nor the transcript for their answers provided. Instead of annotating facial AUs, the dataset provides annotation of what the author calls micro-tensions. To avoid the introduction of subtle changes in the facial expressions and blink dynamics, the subjects were not informed that the main aim of the research is facial analysis in the context of deception detection.
Bag-of-lies (BOL) dataset consists of video, audio, and eye gaze from 35 unique subjects collected using a carefully designed experiment. For the experiment, each subject was shown 6-10 select images and was asked to describe them deceptively or otherwise based on their choice. The frame rate for each video is 30 FPS.

Useful Material:

Ramachandran, Vilayanur, "Microexpression and macroexpression," in Encyclopedia of Human Behavior, Academic Press, 2012, pp. 173-183.

Ekman, P.E. & V. Friesen, W., "Facial action coding system (FACS)," 2002.

Scheve, Tom, "What are microexpressions?," 15 December 2008. [Online]. Available: http://tlc.howstuffworks.com/family/microexpression.htm.

T. Pfister, Xiaobai Li, G. Zhao and M. Pietikäinen, "Recognising spontaneous facial micro-expressions," in International Conference on Computer Vision, Barcelona, 2011.

E. T. Rolls, D. I. Perrett, H. D. Ellis and Paul Ekman, "Facial Expressions of Emotion: An Old Controversy and New Findings [and Discussion]," Philosophical Transactions: Biological Sciences, vol. 335, pp. 63-69, 1992.

CH Hjortsjö, Man's face and mimic language, 1969.

Zhe Wu, Bharat Singh, Larry S. Davis, V. S. Subrahmanian, "Deception Detection in Videos," in The Thirty-Second AAAI Conference (AAAI-18), New Orleans, 2018.