Learning From Unlabeled Videos
CVPR 2019 Workshop
Room E, Hyatt Regency
Long Beach, CA
Sunday June 16, 2019
All invited talks and oral sessions will be in Room E in Hyatt Regency.
Morning posters will be in the Pacific Arena Ballroom (main convention center).
Afternoon posters will be in Room E in Hyatt Regency.
Program Outline
9:00 - 9:15 Welcome
9:15 - 9:45 Invited Speaker 1 : Antonio Torralba
9:45 - 10:15 Invited Speaker 2 : Noah Snavely
10:15 - 10:45 Poster Session 1 & Coffee Break
[#65] Learning Human Pose from Unaligned Data through Image Translation. Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
[#66] You reap what you sow: Using Videos to Generate High Precision Object Proposals for Weakly-supervised Object Detection. Krishna Kumar Singh, Yong Jae Lee
[#67] Temporal Cycle-Consistency Learning. Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
[#68] SelFlow: Self-Supervised Learning of Optical Flow. Pengpeng Liu, Michael Lyu, Irwin King, Jia Xu
[#69] Temporal Attentive Alignment for Video Domain Adaptation. Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
[#70] Automatic Adaptation of Object Detectors to New Domains Using Self-Training. Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller
[#71] DistInit: Learning Video Representations without a Single Labeled Video. Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan
[#72] Online Object Representations with Contrastive Learning in Videos. Soeren Pirk, Mohi Hansari, Corey Lynch, Pierre Sermanet
[#73] Event Segmentation in Streaming Videos without Labels using a Predictive Approach. Sathyanarayanan Aakur, Sudeep Sarkar
10:45 - 11:15 Invited Speaker 3 : Andrew Zisserman
11:15 - 12:15 Oral Session 1
Learning Human Pose from Unaligned Data through Image Translation. Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
You reap what you sow: Using Videos to Generate High Precision Object Proposals for Weakly-supervised Object Detection. Krishna Kumar Singh, Yong Jae Lee
Temporal Cycle-Consistency Learning. Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
SelFlow: Self-Supervised Learning of Optical Flow. Pengpeng Liu, Michael Lyu, Irwin King, Jia Xu
12:15 - 14:00 Lunch Break
14:00 - 14:30 Invited Speaker 4 : Bill Freeman
14:30 - 15:00 Invited Speaker 5 : Abhinav Gupta
15:00 - 15:30 Poster Session 2 & Coffee Break
[#77] 2.5D Visual Sound. Ruohan Gao, Kristen Grauman
[#76] Less is More: Learning Highlight Detection from Video Duration. Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman
[#75] Learning Individual Styles of Conversational Gesture. Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
[#74] Evolving Losses for Unlabeled Video Representation Learning. AJ Piergiovanni, Anelia Angelova, Michael Ryoo
[#73] Stochastic Dynamics for Video Infilling. Qiangeng Xu, Hanwang Zhang, Weiyue Wang, Peter Belhumeur, Ulrich Neumann
[#72] Semi-Supervised Temporal Action Proposals. Jingwei Ji , Kaidi Cao, Juan Carlos Niebles
[#71] Identity from here, Pose from there: Learning to Disentangle and Generate Objects using Unlabeled Videos. Fanyi Xiao, Yong Jae Lee
[#70] Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition. Rishi Madhok, Unaiza Ahsan, Irfan Essa
[#69] 4D Generic Video Object Proposals. Aljosa Osep, Paul Voigtlaender, Mark Weber, Jonathon Luiten, Bastian Leibe
15:30 - 16:00 Invited Speaker 6 : Kristen Grauman
16:00 - 17:00 Oral Session 2
2.5D Visual Sound. Ruohan Gao, Kristen Grauman
Less is More: Learning Highlight Detection from Video Duration. Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman
Learning Individual Styles of Conversational Gesture. Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
Evolving Losses for Unlabeled Video Representation Learning. AJ Piergiovanni, Anelia Angelova, Michael Ryoo
17:00 - 17:05 Closing Remarks
Overview
Deep neural networks trained with a large number of labeled images have recently led to breakthroughs in computer vision. However, we have yet to see a similar level of breakthrough in the video domain. Why is this? Should we invest more into supervised learning or do we need a different learning paradigm?
Unlike images, videos contain extra dimensions of information such as motion and sound. Recent approaches leverage such signals to tackle various challenging tasks in an unsupervised/self-supervised setting, e.g., learning to predict certain representations of the future time steps in a video (RGB frame, semantic segmentation map, optical flow, camera motion, and corresponding sound), learning spatio-temporal progression from image sequences, and learning audiovisual correspondences.
This workshop aims to promote comprehensive discussion around this emerging topic. We invite researchers to share their experiences and knowledge in learning from unlabeled videos, and to brainstorm brave new ideas that will potentially generate the next breakthrough in computer vision.
News and Updates
May 31, 2019: We've received 29 papers and accepted 18 papers; 8 papers will be presented as an oral.
March 14, 2019: Updated author guidelines. Papers should be at most 4 pages *including references*. Papers that exceed 4 pages will count as a publication and could potentially violate the dual submission policy at other conferences
March 9, 2019: Due to multiple requests, we are extending the paper submission deadline to April 15, 2019
Feb 11, 2019: CMT website is open for submission: https://cmt3.research.microsoft.com/LUV2019
Invited Speakers
Abhinav Gupta, CMU / Facebook AI Research
Andrew Zisserman, University of Oxford / DeepMind
Antonio Torralba, MIT
Bill Freeman, MIT / Google AI
Kristen Grauman, UT Austin / Facebook AI Research
Noah Snavely, Cornell University / Google AI
Organizers
Yale Song, Microsoft Research
Carl Vondrick, Columbia University
Katerina Fragkiadaki, Carnegie Mellon University
Honglak Lee, University of Michigan / Google Brain
Rahul Sukthankar, Google AI