Trimodal Fusion Approach

The objective of this project was to explore the till then relatively less explored territory of modeling truly Multimodal data (i.e. there are more than 2 modalities to fuse). In this work, our use-case was the prediction of psychological distress levels of individuals from their verbal (text) and non-verbal (acoustic and visual behavior). We designed a novel hierarchical classification technique. As shown in the figure, the first layer consists of classifiers on features of the individual modalities, while a second layer combines the posterior probabilities from the first layer using an Expectation-Maximization (EM) approach, to generate the final predictions.

The research was conducted in a team of two. The results were published in a conference of international repute.

Publications:

Non-Peer Reviewed:

[* - indicates equal contribution]

Project Funding Support:

We are extremely grateful to the US DARPA for funding this research project.