Learning From Unlabeled Videos

CVPR 2020 Workshop

Friday June 19, 2020

Find Zoom link at: http://cvpr20.com/event/opening-remarkw06/

News & Updates

Jun 19, 2020: The workshop was a great success! Thank you everyone.
Jun 6, 2020: Workshop program is available. The workshop will be fully virtual with an assortment of live and pre-recorded talks.
Apr 11, 2020: Due to COVID-19, we are extending the submission deadline to Friday May 1, 2020.
Mar 15, 2020: CMT is open! https://cmt3.research.microsoft.com/LUV2020
Feb 17, 2020: Site goes live! CMT submission will open soon.
Dec 4, 2019: Site under construction

Program

(Pacific Time Zone, UTC−07:00)

8:00 - 8:15 Welcome
8:15 - 9:00 Invited Speaker 1 : Learning Representations and Geometry from Unlabelled Videos. Andrea Vedaldi (University of Oxford/FAIR) [video]
9:00 - 9:50 Invited Speaker 2 : Learning Vision, Language and Control from Play. Pierre Sermanet (Google Research) [video]
10:00 - 10:50 Invited Speaker 3 : 3D People Watching. Jitendra Malik (UC Berkeley/FAIR) [video]
11:00 - 12:00 Oral Session 1
12:00 - 13:00 Lunch Break
13:00 - 13:50 Invited Speaker 4 : Dynamic Scenes Understanding: Where Are We and What Is Missing? Ivan Laptev (INRIA) [video]
14:00 - 14:50 Invited Speaker 5 : How Can Learning from Unlabeled Videos Help Neural Video Synthesis? Ming-Yu Liu (NVIDIA Research) [video]
15:00 - 15:50 Invited Speaker 6 : Space-Time Correspondence as a Contrastive Random Walk. Alyosha Efros (UC Berkeley) [video]
16:00 - 17:00 Oral Session 2
16:45 - 16:50 Closing Remarks

Overview

Deep neural networks trained with a large number of labeled images have recently led to breakthroughs in computer vision. However, we have yet to see a similar level of breakthrough in the video domain. Why is this? Should we invest more into supervised learning or do we need a different learning paradigm?

Unlike images, videos contain extra dimensions of information such as motion and sound. Recent approaches leverage such signals to tackle various challenging tasks in an unsupervised/self-supervised setting, e.g., learning to predict certain representations of the future time steps in a video (RGB frame, semantic segmentation map, optical flow, camera motion, and corresponding sound), learning spatio-temporal progression from image sequences, and learning audiovisual correspondences.

This workshop aims to promote comprehensive discussion around this emerging topic. We invite researchers to share their experiences and knowledge in learning from unlabeled videos, and to brainstorm brave new ideas that will potentially generate the next breakthrough in computer vision.