Accepted Papers
Oral Papers
1) Multimodal Pyramid Feature Combination for Human Action Recognition, Carlos Roig, David Varas (Vilynx Spain SLU).
2) Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features, Hansol Lee, Gyemin Lee (Seoul National University of Science and Technology).
3) AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection, Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru (Google).
4) Learning to Detect and Retrieve Objects from Unlabeled Videos, Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein (IBM, Techion).
Poster Papers
1) FaceSyncNet: A Deep Learning-Based Approach for Non-linear Synchronization of Facial Performance Videos, Yoonjae Cho, Dohyeong Kim, Edwin Truman, Jean-Charles Bazin (KAIST).
2) A Tale of Two Modalities for Video Captioning, Pankaj Joshi, Chitwan Saharia, Vishwajeet Singh Bagdawat, Digvijay Singh Gautam, Ganesh Ramakrishnan, Preethi Jyothi (IIT Mumbai).
3) Multi-Modal Domain Adaptation for Fine-grained Action Recognition, Jonathan Munro, Dima Damen (University of Bristol).
4) EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen (University of Bristol, Oxford Univ).
5) IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition, Ke Yang, Peng Qiao, Xin Niu, Dongsheng Li, Yong Dou (National University of Defense Technology).
6) DIFRINT: Deep Iterative Frame Interpolation for Full-frame Video Stabilization, Jinsoo Choi, In So Kweon (KAIST).
7) Audio-Video based Emotion Recognition Using Minimum Cost Flow Algorithm, Bac Nguyen (JNU_Multimedia and Image Processing Lab).