RepNet
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Google Research and DeepMind
Google Research and DeepMind
Abstract
Abstract
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos.
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos.
Approach
Approach
Detailed architecture of RepNet. The Temporal Self-similarity Matrix plays a key role in RepNet. The following animation shows how we construct it.
Detailed architecture of RepNet. The Temporal Self-similarity Matrix plays a key role in RepNet. The following animation shows how we construct it.
The TSM surfaces features that make it easy for neural networks to count the number of repetitions in a video.
The TSM surfaces features that make it easy for neural networks to count the number of repetitions in a video.
Real-world TSMs reveal fascinating structure present in the world. Left: Jumping Jacks (constant period), Middle: Bouncing ball (decreasing period), Right: Mixing concrete (Aperiodic segments present in video).
Real-world TSMs reveal fascinating structure present in the world. Left: Jumping Jacks (constant period), Middle: Bouncing ball (decreasing period), Right: Mixing concrete (Aperiodic segments present in video).
Synthetic data generation pipeline
Synthetic data generation pipeline
Left: An example of a synthetic repeating video generated from a random video. Right: An example of a video with camera motion augmentation, which is tougher for the model, but results in better generalization to real repeating videos.
Left: An example of a synthetic repeating video generated from a random video. Right: An example of a video with camera motion augmentation, which is tougher for the model, but results in better generalization to real repeating videos.
Applications
Applications
One model that works across many domains can enable many applications.
One model that works across many domains can enable many applications.
Repetition Counting
Repetition Counting
Speed Change Detection
Speed Change Detection
Paper Explained Review
Paper Explained Review
This is a video review of our paper by Yannic Kilcher. It has nice visual explanations of different parts of RepNet.
This is a video review of our paper by Yannic Kilcher. It has nice visual explanations of different parts of RepNet.
BIBTEX
BIBTEX
@InProceedings{Dwibedi_2020_CVPR,
author = {Dwibedi, Debidatta and Aytar, Yusuf and Tompson, Jonathan and Sermanet, Pierre and Zisserman, Andrew},
title = {Counting Out Time: Class Agnostic Video Repetition Counting in the Wild},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}