QABVR 2024 Tutorial
Champalimaud Center for the Unknown in Lisbon, Portugal
Part 1: From pose to behavior
Learning objectives
Split your data into training, validation, and test sets. Don't look at your test set during development!
Use exploratory data analysis to gain an intuition for your data. Plot trajectories of pose data and rasters of behavior annotations.
Explore different coordinate frames for the same dataset: should we study our subject's behavior in allocentric or egocentric coordinates?
Compute pose features capturing the way the animal is positioned and how it's moving. Consider positional, angular, and derivative features, inter-animal features, and information about the animal's environment. Use ↪pandas DataFrames to keep track of pose features.
Train a behavior classifier using pose features, and evaluate its performance in terms of Precision, Recall, and F1 Score.
Use temporal windowing of features to improve classifier performance.
Key Concepts and Takeaways
Training, validation, and test sets
Use a validation set during model development, to prevent overfitting to your test set.
Messy and imperfect annotations are OK in your training and validation sets, as models can learn around them—but make sure you're happy with the quality of annotations in your test set.
⚠️ "Leak" between datasets is the easiest way to misjudge your classifier's performance. Always split behavioral datasets at the level of videos, never at the level of frames. Consider an 80/20 train/test split: if frame k is in your test set, then there's a 96% chance that either frame k-1 or frame k+1 is in your training set. Neighboring frames of video are highly correlated, so this means your test set is not statistically independent of your training set!
Exploratory data analysis
Find out how much training data you have: how many frames, how many bouts, in how many videos- all of this will affect your classifier's performance.
Plot your data early and often, so you know if there are any issues that will affect model training.
Maybe one video was recorded at 300 fps instead of 30; maybe you loaded the wrong annotations for a video; maybe one of your videos was a side-view instead of a top-view and the pose estimate is garbage; maybe the annotations are at a different framerate from the pose for some videos; maybe the animal wasn't in the video for the first 30 seconds and the pose estimate is garbage.
(All of these are things that happened at one point or another while working on MARS.)
Coordinate frames
Egocentric coordinate frames center tracking on the animal; they're common in unsupervised methods like ↪MoSeq and ↪MotionMapper. But for supervised classification, including data in allocentric coordinates (those of the original video) can sometimes be helpful: for example mice are more likely to rear up when they're standing at the walls of the arena, so a rearing detector might work better if it knows where the mouse is and how close the walls are.
Pose features
Coming up with a useful set of pose features for your behavior classifier can take some thinking. It helps to watch the raw videos of your animal and think about what cues help you recognize a behavior, then work out how to turn those into features.
Models can over-fit if you give them too many features, especially with limited training examples. Your models might also use features in unexpected ways. For example, an image recognition algorithm that pays too much attention to image background can lead to ↪invisible cows.
It's very easy to waste a lot of time hand-crafting features and data preprocessing steps to wring out tiny improvements in validation set performance, only to discover they don't actually help once you evaluate on your test set. (Ask me how I know.) So spend a little effort here, but don't go crazy.
Behavior classifiers
We went light on this section and stuck to XGBoost. In Part 2 you'll have the option to try out some neural net based strategies. These can save you some time struggling with feature engineering, at the cost of being a little more of a black box.
Precision and Recall tell you different things about how your classifier is performing. Depending on your scenario, you might prioritize maximizing one vs the other. For example, if your behavior is rare, set your classifier to have high Recall at the cost of low Precision so you'll have mostly false positives with few false negatives. You can then run your classifier on a new batch of videos and manually delete false positive frames, to create more training data for your models.
Temporal windowing of features
With few exceptions, it's almost impossible to tell what an animal is doing just from looking at its pose estimate on a single frame of video. Temporal windowing dramatically increases your feature count and classifier training time, but they're often worth it.
Put some thought into your choice of time window and your operation (mean, variance, max, min)—like pose feature engineering, your decisions make a difference for model performance.
Part 2: Choose your own adventure
Now that you have some tools for processing pose and behavioral data, let's try them out in some real examples!