Temporal Cycle-Consistency Learning
Google Research and Deepmind
Google Research and Deepmind
Paper accepted at CVPR '19
Paper accepted at CVPR '19
Abstract
Abstract
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space.
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space.
To evaluate the power of the embeddings, we densely label the Pouring and Penn Action video datasets for action phases. We show that (i) the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and (ii) TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks. The embeddings are also used for a number of applications based on alignment (dense temporal correspondence) between video pairs, including transfer of metadata of synchronized modalities between videos (sounds, temporal semantic labels), synchronized playback of multiple videos, and anomaly detection.
To evaluate the power of the embeddings, we densely label the Pouring and Penn Action video datasets for action phases. We show that (i) the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and (ii) TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks. The embeddings are also used for a number of applications based on alignment (dense temporal correspondence) between video pairs, including transfer of metadata of synchronized modalities between videos (sounds, temporal semantic labels), synchronized playback of multiple videos, and anomaly detection.
Approach
Approach
Minimize cycle-consistency error to learn representations useful for temporally fine-grained tasks.
Minimize cycle-consistency error to learn representations useful for temporally fine-grained tasks.
Applications using TCC Embeddings
Applications using TCC Embeddings
Synchronous Playback
Synchronous Playback
Align multiple videos using distance in the learned embedding space.
Align multiple videos using distance in the learned embedding space.
Sound Transfer
Sound Transfer
st.mp4
Fine-grained Retrieval
Fine-grained Retrieval
Anomaly Detection
Anomaly Detection
Anomalous frames deviate too much from the ideal trajectory of an action in the embedding space.
Anomalous frames deviate too much from the ideal trajectory of an action in the embedding space.
BIBTEX
BIBTEX
@InProceedings{Dwibedi_2019_CVPR,author = {Dwibedi, Debidatta and Aytar, Yusuf and Tompson, Jonathan and Sermanet, Pierre and Zisserman, Andrew},title = {Temporal Cycle-Consistency Learning},booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},month = {June},year = {2019}}