Task Programming:
Learning Data Efficient Behavior Representations

CVPR 2021 (Oral)
Awarded best student paper
[paper] [code] [video]

Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data.

To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call “task programming”, which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks.

We evaluate this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an effective way to reduce annotation effort for domain experts.

Our framework, Trajectory Embedding for Behavior Analysis (TREBA), takes input trajectory data from one or more agents, and produces an embedding that can be used for downstream tasks, including behavior classification. TREBA can be trained using self-supervision and programmatic supervision, through expert-written programs from task programming.

Domain experts can choose between doing task programming and/or data annotation. Task programming is the process for domain experts to engineer decoder tasks for representation learning. The programs enable learning of annotation-sample efficient trajectory features to improve performance instead of additional annotations.

Sample programs include behavior attributes such as:

  • Distance between agents

  • Speed of each agent

  • Facing angle between agents

  • Relative position & motion of different parts (ex: head-body angle of mice, wing angle of flies)

TREBA can be trained using different combinations of trajectory reconstruction loss, attribute consistency loss, attribute decoding loss, and contrastive loss. These losses can either be self-supervised (ex: trajectory reconstruction) or programmatically supervised (ex: attribute decoding - decoding the programs from the trajectory embedding). In our paper, we found that different combinations of programmatically supervised decoders perform similarly, and we use the combination of self-decoding + attribute consistency + contrastive loss in our experiments.

We experiment with two mouse datasets and one fly dataset (MARS, CRIM13 and Fly vs. Fly datasets). TREBA is trained using 10 programs for the mouse datasets, and 13 programs for the fly datasets.
Compared to using keypoints alone with the full training set, keypoints + TREBA achieves the same performance using only 1~2% of the training data on MARS, and 5~10% of the training data on CRIM13 and Fly vs. Fly.

Compared to using hand-designed features alone with the full training set, features + TREBA achieves the same performance using only 10% of the training data on MARS, and 50% of the training data on CRIM13 and Fly vs. Fly. Here, TREBA reduces annotation requirements by a factor of 10 in mice and 2 in flies.

Discussion

Our experiments on three datasets (two in mice and one in fruit flies) suggest that our approach is effective across different domains. TREBA is not restricted to animal behavior and may be applied to other domains where tracking data is expensive to annotate, such as in sports analytics. Additionally, TREBA can be applied to other downstream tasks using trajectory data, such as clustering.

Our experiments highlight, and quantify, the tradeoff between task programming and data annotation. The choice of which is more effective will depend on the cost of annotation and the level of expert understanding in identifying behavior attributes. Directions in creating tools to facilitate program creation and data annotation will help further accelerate behavioral studies.

Acknowledgements

We would like to thank Tomomi Karigo at Caltech for providing the mouse dataset. The Simons Foundation (Global Brain grant 543025 to PP) generously supported this work, and this work is partially supported by NIH Award #K99MH117264 (to AK), NSF Award #1918839 (to YY), and NSERC Award #PGSD3-532647-2019 (to JJS).

Correspondence to jjsun (at) caltech (dot) edu.