Counting Out Time: Class Agnostic Video Repetition Counting in the Wild


We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos.


Detailed architecture of RepNet. The Temporal Self-similarity Matrix plays a key role in RepNet. The following animation shows how we construct it.

The TSM surfaces features that make it easy for neural networks to count the number of repetitions in a video.

Real-world TSMs reveal fascinating structure present in the world. Left: Jumping Jacks (constant period), Middle: Bouncing ball (decreasing period), Right: Mixing concrete (Aperiodic segments present in video).

Synthetic data generation pipeline

Left: An example of a synthetic repeating video generated from a random video. Right: An example of a video with camera motion augmentation, which is tougher for the model, but results in better generalization to real repeating videos.


One model that works across many domains can enable many applications.

Repetition Counting

Speed Change Detection

Paper Explained Review

This is a video review of our paper by Yannic Kilcher. It has nice visual explanations of different parts of RepNet.



author = {Dwibedi, Debidatta and Aytar, Yusuf and Tompson, Jonathan and Sermanet, Pierre and Zisserman, Andrew},

title = {Counting Out Time: Class Agnostic Video Repetition Counting in the Wild},

booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

month = {June},

year = {2020}