News & Updates
- Jun 19, 2020: The workshop was a great success! Thank you everyone.
- Jun 6, 2020: Workshop program is available. The workshop will be fully virtual with an assortment of live and pre-recorded talks.
- Apr 11, 2020: Due to COVID-19, we are extending the submission deadline to Friday May 1, 2020.
- Mar 15, 2020: CMT is open! https://cmt3.research.microsoft.com/LUV2020
- Feb 17, 2020: Site goes live! CMT submission will open soon.
- Dec 4, 2019: Site under construction
(Pacific Time Zone, UTC−07:00)
- 8:00 - 8:15 Welcome
- 8:15 - 9:00 Invited Speaker 1 : Learning Representations and Geometry from Unlabelled Videos. Andrea Vedaldi (University of Oxford/FAIR) [video]
- 9:00 - 9:50 Invited Speaker 2 : Learning Vision, Language and Control from Play. Pierre Sermanet (Google Research) [video]
- 10:00 - 10:50 Invited Speaker 3 : 3D People Watching. Jitendra Malik (UC Berkeley/FAIR) [video]
- 11:00 - 12:00 Oral Session 1
- 11:00 - 11:15 Evolving Losses for Unsupervised Video Representation Learning. AJ Piergiovanni, Anelia Angelova, Michael S Ryoo [video][slides][paper]
- 11:15 - 11:30 A Local-to-Global Approach to Multi-modal Movie Scene Segmentation. Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin [video][slides][paper]
- 11:30 - 11:45 Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video. Oier Mees, Markus Merklinger, Gabriel Kalweit, Wolfram Burgard [video][slides][paper]
- 11:45 - 12:00 Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos. Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova [video][slides][paper]
- 12:00 - 13:00 Lunch Break
- 13:00 - 13:50 Invited Speaker 4 : Dynamic Scenes Understanding: Where Are We and What Is Missing? Ivan Laptev (INRIA) [video]
- 14:00 - 14:50 Invited Speaker 5 : How Can Learning from Unlabeled Videos Help Neural Video Synthesis? Ming-Yu Liu (NVIDIA Research) [video]
- 15:00 - 15:50 Invited Speaker 6 : Space-Time Correspondence as a Contrastive Random Walk. Alyosha Efros (UC Berkeley) [video]
- 16:00 - 17:00 Oral Session 2
- 16:00 - 16:15 G3AN: Disentangling Appearance and Motion for Video Generation. Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva [video][slides][paper]
- 16:15 - 16:30 MAST: A Memory-Augmented Self-Supervised Tracker. Zihang Lai, Erika Lu, Weidi Xie [video][slides][paper]
- 16:30 - 16:45 Cleaning Label Noise With Clusters for Minimally Supervised Anomaly Detection. Muhammad Zaigham Zaheer, Jin-Ha Lee, Marcella Astrid, Arif Mahmood, Seung-Ik Lee [video][slides][paper]
- 16:45 - 16:50 Closing Remarks
Deep neural networks trained with a large number of labeled images have recently led to breakthroughs in computer vision. However, we have yet to see a similar level of breakthrough in the video domain. Why is this? Should we invest more into supervised learning or do we need a different learning paradigm?
Unlike images, videos contain extra dimensions of information such as motion and sound. Recent approaches leverage such signals to tackle various challenging tasks in an unsupervised/self-supervised setting, e.g., learning to predict certain representations of the future time steps in a video (RGB frame, semantic segmentation map, optical flow, camera motion, and corresponding sound), learning spatio-temporal progression from image sequences, and learning audiovisual correspondences.
This workshop aims to promote comprehensive discussion around this emerging topic. We invite researchers to share their experiences and knowledge in learning from unlabeled videos, and to brainstorm brave new ideas that will potentially generate the next breakthrough in computer vision.
University of Oxford / FAIR
UC Berkeley / FAIR
University of Michigan / Google Research