Structured Representations for Video Understanding
ICCV 2021 workshop
News and updates
November 26th 2021: The presentations, keynotes, and the panel are on the website now.
October 16th 2021: The papers are on the website, see you at the workshop!
September 24th 2021: The paper decisions have been made and the schedule will be finalized soon.
July 20th 2021: The submission deadline is August 27th. Check submission instructions here.
April 12th 2021: The workshop has been accepted for ICCV 2021.
Workshop schedule
9:00 - 9:10 | Opening
9:10 - 9:40 | Session 1: Learning person and object relations in videos
9:40 - 10:10 | Keynote1: Trevor Darrell [video]
10:10 - 11:00 | Session 2: Self-supervised and unsupervised video representation learning
11:00 - 11:30 | Keynote 2: Kristen Grauman [video]
11:30 - 12:00 | Session 3: Advances in tracking and segmentation
12:00 - 13:50 | Lunch and discussion with authors
13:50 - 14:20 | Keynote 3: Josef Sivic [video]
14:20 - 14:40 | Session 4: Video understanding at small scales
14:40 - 15:10 | Keynote 4: Deva Ramanan [video]
15:10 - 15:30 | Session 5: Video understanding with multiple modalities
15:30 - 16:30 | Keynote panel with Deva Ramanan and Josef Sivic [video]
Overview
Finding the intrinsic structure of a video is an open research problem. Research in video understanding has found a great deal of success through convolutional solutions trained on large-scale datasets. Building on success in the field, a growing body of literature has extended convolutional approaches with additional structure for enhanced or more general understanding. Examples of structures include discovering the scene graph of a video, embedding videos on non-Euclidean manifolds, learning representations from unlabeled videos, and incorporating prior knowledge in video representation learning to better infer seen and unseen action labels. This workshop seeks to open up the discussion on how to learn and impose structure in video understanding. Embedding structure will not only increase our fundamental understanding of videos, but also has a wide range of downstream applications, ranging from action recognition to precise localization and long-term reasoning or forecasting.
Invited speakers
Kristen Grauman
University of Texas at Austin
Deva Ramanan
Carnegie-Mellon University
Josef Sivic
INRIA / Czech Technical University
Trevor Darrell
UC Berkeley
Organizers
Pascal Mettes
University of Amsterdam
Carl Vondrick
Columbia University
Dídac Surís
Columbia University
Hazel Doughty
University of Amsterdam
Mike Shou
NUS Singapore
Shih-Fu Chang
Columbia University
Cordelia Schmid
INRIA / Google