Multimodal Fusion Techniques for Determining Speaker Traits

In this project, the objective was to predict the degree of passion and credibility of a movie reviewer (either high or low) from multimodal cues (text, audio, and video) using novel recognition approaches. We explored a novel ensemble-based technique of two recognition schemes for the prediction task. As shown in the figure, in one of them, we explicitly modeled the dependencies and correlation between the modalities (the bottom pipeline) while in the other a joint feature space was considered for modeling with appropriate assumptions (the top pipeline).

The project was undertaken under my leadership. The results have been published in a conference of international repute.

Publications:

Non-Peer Reviewed:


Project Funding Support:

We are extremely grateful to US NSF for financially supporting our project.