Image & Vision

Acoustic & Speech

Image & Vision

Natural Language Processing

Image & Vision

Investigates Image data fusion techniques and video analytics that combine image and track data from multiple sensors to achieve improved accuracies and more specific inferences than could be achieved by using a single sensor alone. Our aim is to explore the state-of-the-art image processing and video analytics algorithms for achieving effective enhancement, detection, tracking, and video summarization as in:

Multimodal Emotion Recognition

1. Introduction

Multimodal Emotion Recognition refers to the classification of input video sequences into emotion labels based on multiple input modalities (usually video, audio and text). In recent years, Deep Neural networks have shown remarkable performance in recognizing human emotions, and are on par with human-level performance on this task. Despite the recent advancements in this field, emotion recognition systems are yet to be accepted for real world setups due to the obscure nature of their reasoning and decision-making process.

Page updated

Report abuse