Accepted Papers

Oral Papers

1) Multimodal Pyramid Feature Combination for Human Action Recognition, Carlos Roig, David Varas (Vilynx Spain SLU).

2) Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features, Hansol Lee, Gyemin Lee (Seoul National University of Science and Technology).

3) AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection, Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin‎, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski‎, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru (Google).

4) Learning to Detect and Retrieve Objects from Unlabeled Videos, Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein (IBM, Techion).


Poster Papers

1) FaceSyncNet: A Deep Learning-Based Approach for Non-linear Synchronization of Facial Performance Videos, Yoonjae Cho, Dohyeong Kim, Edwin Truman, Jean-Charles Bazin (KAIST).

2) A Tale of Two Modalities for Video Captioning, Pankaj Joshi, Chitwan Saharia, Vishwajeet Singh Bagdawat, Digvijay Singh Gautam, Ganesh Ramakrishnan, Preethi Jyothi (IIT Mumbai).

3) Multi-Modal Domain Adaptation for Fine-grained Action Recognition, Jonathan Munro, Dima Damen (University of Bristol).

4) EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen (University of Bristol, Oxford Univ).

5) IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition, Ke Yang, Peng Qiao, Xin Niu, Dongsheng Li, Yong Dou (National University of Defense Technology).

6) DIFRINT: Deep Iterative Frame Interpolation for Full-frame Video Stabilization, Jinsoo Choi, In So Kweon (KAIST).

7) Audio-Video based Emotion Recognition Using Minimum Cost Flow Algorithm, Bac Nguyen (JNU_Multimedia and Image Processing Lab).