Trimodal Fusion Approach
The objective of this project was to explore the till then relatively less explored territory of modeling truly Multimodal data (i.e. there are more than 2 modalities to fuse). In this work, our use-case was the prediction of psychological distress levels of individuals from their verbal (text) and non-verbal (acoustic and visual behavior). We designed a novel hierarchical classification technique. As shown in the figure, the first layer consists of classifiers on features of the individual modalities, while a second layer combines the posterior probabilities from the first layer using an Expectation-Maximization (EM) approach, to generate the final predictions.
The research was conducted in a team of two. The results were published in a conference of international repute.
Publications:
M. Chatterjee*, S. Ghosh*, L. P. Morency, "A Multimodal Context Based Approach for Distress Assessment", ACM Int'l Conf. on Multimodal Interfaces, 2014 (ACM ICMI 2014).
Non-Peer Reviewed:
M. Chatterjee, "Probabilistic Multimodal Fusion Approaches for Recognition Tasks", Technical Report, 2015.
[* - indicates equal contribution]
Project Funding Support:
We are extremely grateful to the US DARPA for funding this research project.