Pushyami Reddy Ginnavaram

Data 606 Capstone Project - Spring 2021

Facial Expression Recognition using CNN

Overview:

Facial expressions are an essential factor in understanding the emotion of a person. These days emotions are required to study many things, including anger, love, hatred, etc. Sometimes facial expressions are the key to communicate few things, including agreement and disagreement. In this project, I am trying to implement a model that would understand facial expressions. Emotions are a powerful tool in communication and one way that humans show their emotions is through their facial expressions. One of the challenging and powerful tasks in social communications is facial expression recognition, as in non-verbal communication, facial expressions are key. [1]

The automated analysis of facial expressions has been widely used in different research areas, such as biometrics or emotional analysis. Special importance is attached to facial expressions in sign language, since they help to form the grammatical structure of the language and allow for the creation of language disambiguation, and thus are called Grammatical Facial Expressions. [3]

Objective:

The main motive of the project is to recognize the facial expressions of a human face. In this project, I am trying to implement a model that would understand facial expressions. In this project, a deep learning model, i.e., Convolutional Neural Network (CNN), is used to determine the output accurately. The steps in this project include identifying the face, feature extraction, which needs the input data to be translated into a set of features; by doing this, the large volumes of data can be segregated into relevant groups.

About the Data:

I am planning to use the data extracted from https://www.kaggle.com/jonathanoheix/face-expression-recognition-dataset , consist of images with datapoint files and also the images with different facial expressions.

The data is divided into training and validation with 7 groups each with more than 3000 images in each group with less than 2KB of image size. The total size of the data is 121MB.

Related Work:

S. Peng, J., OS. Anies, M., Dean, D., A. Savran, B., Bookstein, F., A. Colombo, C., . . . LE. Peterson, M. (1970, January 01). 3-Dimensional facial expression recognition in human using multi-points warping. Retrieved February 18, 2021, from https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3153-2

This research proposes the identification of facial expression in humans with the application of Multi-points Warping for a 3D facial landmark by constructing a template mesh, The semi-landmarks are allowed to slide along tangents to the curves and surfaces until the bending energy between a template and a target form is minimal and localization error is assessed using Procrustes ANOVA. By using Principal Component Analysis (PCA) for feature selection, classification is done using Linear Discriminant Analysis (LDA). [4]

GitHub:

https://github.com/pushyag1/Capstone_606

Presentation:

Sides

Video

UMBC presentation template

PHASE - 2

Data Loading and Preprocessing:

The images are being segregated based on different emotions. Trained a convolutional neural network to classify the image with respect to the emotion.
Used a dataset containing different facial expressions of different human beings.
Initially downloaded the data and extracted the images, organize the images into different folders and train the model.
Make and evaluate the test predictions.
Created a bunch of destination folders according to the directory convention. This means it will have an outer folder (data) with subfolders: train, validation, and test. Within each of those folders, there is a folder named happy, neutral, sad, anger, etc.

Model Training :

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics. [8]
Using the Keras framework, the model is created so as to get it trained using the segregated dataset, the evaluation metrics are calculated to find the accuracy of the model.
Ran the model for 20 epochs, the learning rate decreases with each epoch, allowing us to get closer and closer to the optimum.
ReLu speeds up training. There is evidence that having the “mean activation” be close to 0 makes training faster.
The below is the model to identify the facial expressions of a person, here 4 CNN layers have been used with an activation function i.e., ReLu.

Predictions on Test data:

Testing the model by feeding the test data and evaluate the performance of the model based on the output predicted by the model.
Initially, the model was able to perform the correct output for only a few of the images, but later the weights and few activation functions have been adjusted.
After few trials, the model was able to provide 86% accuracy for the test data.

Presentation:

Slides-Phase2

Video-Phase2

Capstone_606_Phase2

PHASE - 3

Face Detection using Cascade classifier:

Using Haar cascade classifier to detect the face, this XML file contains the geometric alignments of all the along with the bounding box for the given images in the dataset.
It's an effective way in object detection, to detect the images which have been proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001.
This contains a model with good accuracy to detect the Faces and place a bounding box for the detection of emotions.

Build a CNN Architecture:

The initial step involved in setting up the model is to initialize the training and validation generators.
The generator used for this data is ImageDataGenerator, and then the images are rescaled to 1./255 is to transform every pixel value from range [0,255] -> [0,1]. Scaling every image to the same range [0,1] will make images contribute more evenly to the total loss

Addition of more Activation Functions:

Building a model (emotion model) using Convolutional Neural Networks, the architecture includes input and output layers with many hidden computational layers.
The 2D convolutional layer with kernel size (3,3), and an activation layer ReLU (Rectified Linear Unit) is added,
The Activation function plays an important role in the ignition of hidden nodes to generate a more accurate output. The critical part of a neural network is the addition of more activation functions.
The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make. [7]
The key part of Neural Networks is the activation functions.
The addition of activation functions in the model depends on the type of predicted data.
In this model, used for facial expression recognition, I have used ReLU as the activation function.

Model Summary:

The Model for the built CNN model is summarized, the total number of parameters is 4,967,623 is because every hidden unit I have 896 input weights and one weight is a bias with the connection. This means that every hidden unit gives back 896 parameters.

The role of all the hidden layers and additional bias terms are important to produce significant results.
The activation function of this model is Relu. This function sets the zero threshold and looks like: f(x) = max(0,x). If x > 0 — the volume of the array of pixels remains the same, and if x < 0 — it cuts off unnecessary details in the channel.[9]
Max Pooling 2D layer is a pooling operation for spatial data. Numbers 2, 2 denote the pool size, which halves the input in both spatial dimensions.[9]
After three groups of layers, there are two fully connected layers. [9]

Flatten performs the input role. Next is Dense — densely connected layer with the value of the output space (64) and ReLU activation function. It follows Dropout, which is preventing overfitting. Overfitting is the phenomenon when the constructed model recognizes the examples from the training sample, but works relatively poorly on the examples of the test sample. Dropout takes values between 0 and 1. Тhe last fully connected layer has 1 output and a Sigmoid activation function.[9]

Model Performance:

The Cascade Classifier used to detect faces in the image is 'haarcascade_frontalface_default.xml'.
This classifier is an effective one to detect the faces more accurately so as to recognize the emotion for that particular face.
The next step is model compiling. It has a binary cross-entropy loss function, which will show the sum of all individual losses. The optimizer algorithm is RMSprop, which is good for recurrent neural networks. The accuracy metrics show the performance of the model.[9]
The model is being trained with the given training dataset with a learning rate of 0.01 and the number of epochs given is 50.
Using Python GUI, the evaluation metrics are being visualized. The accuracy, Precision score, Recall score, F1 score are calculated and visualized.
The model is performing with an accuracy of 93.5%

Emotion Detection:

There are seven groups for image training and test data which include Anger, surprise, sad, happy, disgust, fear, neutral.
The data is segregated into groups containing different images in different groups.
The CNN model named emotion model performed with a good accuracy of 93% and the results have been integrated and displayed using the python GUI.
To build a GUI using python, the library which was used was PyQt5.

Graphical User Interface using Python

PyQt5:

PyQt5 is a python library used for building the graphical user interface, it is a set of Python bindings for Qt v5, which is a set of C++ libraries to implement APIs at a high level for mobile and desktop applications. [10]
PyQt5 is based on Qt v5 and includes classes that cover Graphical User Interfaces as well as XML handling, network communication, regular expressions, threads, SQL databases, multimedia, web browsing, and other technologies available in Qt. PyQt5 implements over one thousand of these Qt classes in a set of Python modules, all of which are contained within a top-level Python package called PyQt5.[10]
PyQt5 is compatible with Windows, Unix, Linux, macOS, iOS, and Android. This can be an attractive feature if you’re looking for a library or framework to develop multi-platform applications with a native look and feel on each platform.[10]

GUI:

A The GUI contains text which says Face Expression Recognition and a push button - Click here.
The push button when clicked redirects to the page with two other push buttons, one for image and the other for Webcam.

Image Emotion detection GUI :

When the push button with the image is clicked, the displayed GUI is shown here.
The geometrical plain/blank box is to display the image which is being put beside the text bar which says image path.
The image path is uploaded in the image path text box by clicking on the Insert Path click push button.
There are two other push buttons being displayed one to clear the image path and the other to detect the emotion.
The detect push button has the model integrated into it to detect the emotion of the uploaded picture in the Image path.

Results: (Image)

img.JPG

FER.JPG

Results: (Webcam)

camera2.JPG

camera.JPG

Presentation:

Slides

Video Presentation

UMBC presentation template

References:

Facial expression recognition with convolutional neural networks. (n.d.). Retrieved February 16, 2021, from https://ieeexplore.ieee.org/abstract/document/9031283
Dhami, D. (2018, December 21). Face Recognition/Special applications of CNN. Retrieved February 16, 2021, from https://medium.com/@dhartidhami/face-recognition-special-applications-of-cnn-51b928a3cd40
UCI machine Learning Repository: Grammatical facial Expressions data set. (n.d.). Retrieved February 18, 2021, from https://archive.ics.uci.edu/ml/datasets/Grammatical+Facial+Expressions
S. Peng, J., OS. Anies, M., Dean, D., A. Savran, B., Bookstein, F., A. Colombo, C., . . . LE. Peterson, M. (1970, January 01). 3-Dimensional facial expression recognition in human using multi-points warping. Retrieved February 18, 2021, from https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3153-2
Ching, C. (2019, March 29). How to build an image classifier for waste sorting. Retrieved April 04, 2021, from https://towardsdatascience.com/how-to-build-an-image-classifier-for-waste-sorting-6d11d3c9c478
Saha, S. (2018, December 17). A comprehensive guide to convolutional neural networks - the eli5 way. Retrieved April 04, 2021, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Singhal, A. (2021, January 12). Facial expression detection using Machine Learning in Python. Medium. https://medium.com/analytics-vidhya/facial-expression-detection-using-machine-learning-in-python-c6a188ac765f.

8. Hussain, S. (2020, April 6). Building a Convolutional Neural Network: Male 👨 vs Female 👩. Medium. https://towardsdatascience.com/building-a-convolutional-neural-network-male-vs-female-50347e2fa88b.

9. Sorokina, K. (2019, February 26). Image Classification with Convolutional Neural Networks. Medium. https://medium.com/@ksusorokina/image-classification-with-convolutional-neural-networks-496815db12a8#:~:text=Convolutional%20neural%20networks%20(CNN)%20is,this%20architecture%20is%20image%20classification.&text=Instead%20of%20the%20image%2C%20the%20computer%20sees%20an%20array%20of%20pixels.

10. Real Python. (2021, April 30). Python and PyQt: Building a GUI Desktop Calculator. Real Python. https://realpython.com/python-pyqt-gui-calculator/.

11. Image courtesy :Hassouneh, A., Mutawa, A. M., & Murugappan, M. (2020, June 12). Development of a Real-Time Emotion Recognition System Using Facial Expressions and EEG based on machine learning and deep neural network methods. Informatics in Medicine Unlocked. https://www.sciencedirect.com/science/article/pii/S235291482030201X.

12. RodrigoRodrigo 5, John MontgomeryJohn Montgomery 8, RobertRobert 64577 silver badges1414 bronze badges, KkndKknd 2, & Sanjay MarisonSanjay Marison 6388 bronze badges. (1957, November 1). How do I access my webcam in Python? Stack Overflow. https://stackoverflow.com/questions/604749/how-do-i-access-my-webcam-in-python.

13. YouTube. (2019, July 3). PyQt5 Tutorial - Setup and a Basic GUI Application. YouTube. https://www.youtube.com/watch?v=Vde5SH8e1OQ.

14. PyQt5 Tutorial 2021, Create Python GUIs with Qt. Martin Fitzpatrick. (n.d.). https://www.mfitzp.com/courses/pyqt/.

Code References:

amineHorseman. (n.d.). amineHorseman/facial-expression-recognition-using-cnn. GitHub. https://github.com/amineHorseman/facial-expression-recognition-using-cnn.

Oheix, J. (2019, January 3). Face expression recognition dataset. Kaggle. https://www.kaggle.com/jonathanoheix/face-expression-recognition-dataset/code.

Page updated

Report abuse