Multi-Modal Learning from Videos
Room 201B, June 17, 2019
CVPR 2019, LONG BEACH
"Multisensory Integration (also known as multimodal integration) describes a process by which information from different sensory systems is combined to influence perception, decisions, and overt behavior."
Video data is explosively growing as a result of ubiquitous acquisition capabilities. The videos captured by smart mobile phones, from ground surveillance, and by body-worn cameras can easily reach the scale of gigabytes per day. While the "big video data" is a great source for information discovery and extraction, the computational challenges are unparalleled. Intelligent algorithms for automatic video understanding, summarization, retrieval, etc. have emerged as a pressing need in such context. Progress on this topic will enable autonomous systems for quick and decisive acts based on the information in the videos, which otherwise would not be possible.
This workshop takes place on Monday, June 17 in room 201B.
Video understanding from a sentence
Learning to Act by Watching Videos
Learning Visual Representation and Grounded Language Generation
Learning from First-Person Video
- Grounded Video Description. Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason Corso, Marcus Rohrbach.
- The Emotionally Intelligent Robot: Improving Socially-aware Human Prediction in Crowded Environments. Aniket Bera, Tanmay Randhavane, Dinesh Manocha.
- Continuous Hand Gesture Recognition Algorithm Based On Multimodal Feature Fusion. Hoang Nguyen, Guee-Sang Lee, Soo-Hyung Kim, Hyung-Jeong Yang.
- Self-Supervised Segmentation and Source Separation on Videos. Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh McDermott, Antonio Torralba.
- Adversarial Inference for Multi-Sentence Video Description. Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach.
- 2.5D Visual Sound. Ruohan Gao, Kristen Grauman.