Multi-Modal Video Analysis

Aug. 23, 2020 | Virtual, in conjunction with ECCV 2020

The workshop was a great success! Thank you everyone. All the videos and slides are available on the program page.

Video understanding/analysis is a very active research area in the computer vision community. This workshop aims to particularly focus on modeling, understanding, and leveraging the multi-modal nature of video. Recent research has amply demonstrated that in many scenarios multimodal video analysis is much richer than analysis based on any single modality. At the same time, multimodal analysis poses many challenges not encountered in modeling single modalities for understanding of videos (for e.g. building complex models that can fuse spatial, temporal, and auditory information). The workshop will be focused on video analysis/understanding related, but not limited, to the following topics:

- deep network architectures for multimodal learning.

- multimodal unsupervised or weakly supervised learning from video.

- multimodal emotion/affect modeling in video.

- multimodal action/scene recognition in video.

- multimodal video analysis applications including but not limited to sports video understanding, entertainment video understanding, healthcare etc.

- multimodal embodied perception for vision (e.g. modeling touch and video).

- multimodal video understanding datasets and benchmarks.


Link to 2019 workshop held in conjunction with ICCV '19.

Contact: For any questions regarding the workshop, please contact Rameswar Panda at at rpanda@ibm.com.