Pre-Defense

Introduction
Problem Statement
Motivation and Objective
Related Works
Comparison Between Existing Works

Gap Analysis
Proposed Methodology
Dataset
Result Analysis
Limitations

Introduction

This study focuses on the crucial role of facial expressions in determining an individual’s emotion. The aim is for machines in HCI to accurately perceive human emotions. The basic premise is emotion detection from images using deep learning models. We want to present a robust method for detecting six fundamental emotions anger, neutral, happiness, sadness, disgust, and surprise through facial expressions.

Problem Statement

Machines must accurately interpret human emotions through facial expressions to advance Human-Computer Interaction (HCI). Without perfect emotion recognition, smooth and emotionally intelligent interfaces and meaningful human-computer interactions are limited. This limitation prevents the creation of seamless, emotionally intelligent interfaces, limiting human-computer contact. The biggest issue is the necessity for advanced models, such as deep learning models, to recognize and respond to facial expressions. This issue must be tackled to close the gap and provide more meaningful, responsive, and intuitive HCI solutions.

Motivations

This study is motivated by a notable gap in the field of emotion detection technology, especially within HCI.
Aiming to contribute to technologies adept at understanding and responding to human emotions.
The lack of applicable systems in this topic is also a motivating factor for this research.

Objective

To improve human-computer interactions by accurately identifying and responding to emotions, making interactions more authentic and sympathetic.
To contribute to mental health monitoring by accurately classifying emotions is crucial for assessing well-being.
To gain deeper customer insights by understanding emotional responses, enabling personalized user experiences for products and services.
To make advancements in fields like education, healthcare, and research by application of accurate emotion identification.

Related Works

Dachapally et al. (2017) utilized autoencoders and an 8-layer CNN for emotion recognition, achieving superior results on the JAFFE dataset with 86.38% accuracy and 67.62% on LFW. The study suggests potential enhancements for autoencoders in emotion representation.

Syed et al. (2018) extended the Cohn-Kanade dataset, employing Fisher Face with 56% accuracy and a dual-classifier (Fisher Face plus HOG) matching HOG's 81% accuracy in 20% less time. Their combination of Viola-Jones, Harris corners, PCA, LDA, HOG, and SVM effectively recognized seven facial emotions, including disdain.

Ko et al. (2018) explored traditional and deep-learning-based facial emotion recognition (FER) techniques, highlighting the hybrid CNN + LSTM method's impressive 78.61% accuracy. The study identifies challenges such as the need for large-scale datasets and addressing the complexity of micro-expressions, providing valuable insights for researchers in the field.

Avanija et al. (2022) employed varied datasets (CK+, JAFFE, FER2013), showcasing Viola-Jones + CNN (88.56%) emphasizing the importance of optimal model combinations for improved Facial Emotion Recognition (FER) in Human-Machine Interaction.

Kanagaraju et al. (2022) used a VGG16-based CNN on FER-2013 for real-time emotion recognition (64.52% accuracy), addressing challenges in micro-expression identification, accuracy enhancement, and ethical considerations in intelligent human-computer interaction. (Contiune...)

Comparison Between Existing Table

Gap Analysis

Challenge in collecting diverse and well-annotated facial expression data impacts FER model effectiveness.
Reviewed papers utilized online archived datasets, leading to polarizing results very high or poor accuracy.
Discrimination between experimental results and desired outcomes highlighted challenges in model training and data quality.
Need for a better understanding of the complex interplay between model training, data quality, and accurate emotion recognition has been noted.
Lack of implementation of findings in real-world applicable fields like HCI is a notable gap in reviewed papers.

Proposed Methodology

Overall Work Procedure

Dataset

Distribution of Dataset

Sample of Dataset

Data Pre-processing

Image Resize (224x224)
Gaussian smoothing
Data Augmentation

Before and after Image Resize

Before and After Image Smoothing

Model Description

Vision Transformer Model Architecture

Proposed Model Architecture Based On ViT

Result Analysis

Prediction Table

Comparing the Precision, Recall, and F1-score of Vision Transformer and other Four Transfer Learning Models

Accuracy Score

Confusion Matrix (ViT)

Comparison accuracy of training and validation (ViT)

Comparison Loss of training and validation (ViT)

Limitations

Data Quality and Bias: Limited diversity in training data can lead to biases.
Generalization Issues: Models may not perform well in varied real-world conditions.
Ethical Concerns: Privacy and consent issues in facial data collection.
High Computational Demand: Requires substantial resources, limiting accessibility.
Real-Time Challenges: Difficult to achieve low latency on edge devices.
Interpretability: Models choices aren't always clear, which makes them harder to use in sensitive areas.

Future Scope

Optimizing ViT: Enhancing emotion detection accuracy even further through ViT refinement.
Diverse Dataset Expansion: Including broader demographic and cultural data for improved model applicability.
Multimodal Integration: Combining facial expressions with voice tone and body language for richer emotion detection.
Ethical Considerations: Addressing privacy, security, biases, and human autonomy impacts.
Real-world Implementation: Integrating the model into real-world scenarios.
HCI Advancements: Developing adaptive and ethical human-computer interaction systems.

Conclusions

The research explores emotion detection through image processing and deep learning, emphasizing the selection of Vision Transformer (ViT) for its high accuracy of 82.96%. This marks a significant advancement in Human-Computer Interaction (HCI) and underscores the transformative potential of ViT in understanding human emotions from facial expressions. The study's findings indicate a promising future for integrating these technologies into practical applications, reshaping emotion detection and enhancing our understanding of human behavior through AI-driven insights.

PreDefense FL23D171_Presentation Slide

Presentation Slide