Multimodal Fusion Techniques for Determining Speaker Traits

In this project, the objective was to predict the degree of passion and credibility of a movie reviewer (either high or low) from multimodal cues (text, audio, and video) using novel recognition approaches. We explored a novel ensemble-based technique of two recognition schemes for the prediction task. As shown in the figure, in one of them, we explicitly modeled the dependencies and correlation between the modalities (the bottom pipeline) while in the other a joint feature space was considered for modeling with appropriate assumptions (the top pipeline).

The project was undertaken under my leadership. The results have been published in a conference of international repute.

Publications:

M. Chatterjee, S. Park, L. P. Morency, S. Scherer, "Combining Two Perspectives on Classifying Multimodal Data for Recognizing Speaker Traits", ACM Int’l Conf. on Multimodal Interfaces 2015 (ACM ICMI 2015) (Oral) (Outstanding Paper Award).

Non-Peer Reviewed:

M. Chatterjee, "Probabilistic Multimodal Fusion Approaches for Recognition Tasks", Technical Report, 2015.

Project Funding Support:

We are extremely grateful to US NSF for financially supporting our project.

Page updated

Google Sites

Report abuse