Self-Supervised
Keypoint Discovery in
Behavioral Videos

CVPR 2022
[paper] [code]
We will run B-KinD on your videos by request! [Request form] 

Summary of our paper:

1. We propose a self-supervised method for discovering keypoints from real-world behavioral videos, based on spatiotemporal difference reconstruction. 

2. Experiments across a range of organisms (mice, flies, human, jellyfish, and tree) demonstrating the generality of the method and showing that the discovered keypoints are semantically meaningful. 

3. Quantitative benchmarking on downstream behavior analysis tasks showing performance that is comparable to supervised keypoints. 

You can try training our model on any behavioral video without annotations! See our code: https://github.com/neuroethology/BKinD

Our code is more suitable for videos with:

The key to our approach is to discover keypoints that can reconstruct the spatiotemporal difference across video frames at time t and time t+T, using appearance features from image at time t, and discovered keypoints at time t & t+T.  This enables our keypoints to encode information that can recover agent movements.

We found that our discovered keypoints perform comparably to supervised keypoints on downstream experiments, suggesting that discovered keypoints has the potential to reduce manual annotation effort for behavior analysis.

Acknowledgements

This work was generously supported by the Simons Collaboration on the Global Brain grant 543025 (to PP and DJA), NIH Award #R00MH117264 (to AK), NSF Award #1918839 (to YY), NSF Award #2019712 (to JOD and RHG), NINDS Award #K99NS119749 (to BW), NIH Award #R01MH123612 (to DJA and PP), NSERC Award #PGSD3-532647-2019 (to JJS), as well as a gift from Charles and Lily Trimble (to PP).

Correspondence to jjsun (at) caltech (dot) edu.