QABVR 2024 Tutorial
Champalimaud Center for the Unknown in Lisbon, Portugal
Improving performance of your classifier
How well can you detect social behaviors of interest?
Now that you've gotten the hang of classification, you can make improvements on your classifier design to see how high you can push your performance on the MABe Task 1 dataset. You can keep working in your notebook from Part 1, or you can use the official AICrowd Challenge page and see how well your approach performs on the official test set.
Steps
Head to the AICrowd MABe Challenge page and set up an account so that you can make submissions.
Use this Getting Started notebook to work with the full Task 1 dataset.
Train a classifier on your own data
Use Bento to annotate behavior, and train the tutorial classifier on that data
If you brought your own dataset (or have collected one here), you can use your time this afternoon to try training your own behavior classifier using the notebook from Part 1.
Steps
If you have access to Matlab, you can download our data visualization interface Bento and start annotating behaviors in your videos.
There's a tutorial on annotating in Bento on the Github wiki.
Another nice free behavior annotation tool you could try is BORIS.
This Colab notebook has some helper functions for getting your annotation and pose data into the format used by the Part 1 notebook.
If you have annotations generated with software other than BORIS or Bento, we can set up a custom data loader so you can get them into the format our classifier expects. (I have some experimental code we can try refining for opening Ethovision data.)
Once you get your dataset into a json file, load it into the notebook from Part 1 and work from there.
Try training a neural network for classification
Explore the four baseline models from the MABe dataset.
Rather than using our pandas-based filters to characterize the dynamics of our pose features, we can use neural networks to classify behavior based on feature values within a window of the current frame.
In the MABe 2021 paper we tested out four different neural network architectures to see how well they perform as behavior classifiers. The winning solution to the competition also used a neural network based approach.
Options
To run the MABe Baselines locally, clone the MABe Baselines repository from the AICrowd Gitlab and follow the instructions in the README.
You can also check out the winning Challenge Solution repository by Ben Wild. Ben's strategy was to learn an embedding of pose trajectories leveraging ALL the data we made available (all three challenge tasks plus the "task 0" trajectories with no accompanying annotations), and then train classifiers that read out from that embedding. The code is quite involved, but you can use his notebook exploring Task 2 annotation styles to see some of the learned embeddings from the model.
Inter-annotator style differences
Study a dataset of 10 videos, each scored by 8 annotators
Annotating behavior can sometimes feel a bit subjective, and different people (even in the same lab) often settle on different mental rules for what counts as the start and stop of a behavior. While working on MARS, we ran a study in which eight labmates all annotated the same set of videos for attack, mounting, and close investigation, and looked at some of the differences in annotations that emerged.
Can we use pose features or classifiers to explain where differences in annotation style come from?
Steps
Download multi_annotator_data.json from the Google Drive folder.
Note: you can access the original videos and annotations of the published multi-annotator dataset here—since it doesn't include pose estimates, we'll instead use a prepared version.
This file is slightly modified from the .json format used in the tutorial- each trial has one field for pose, and one field for each annotator.
Questions
Can you modify the annotation visualization code from the tutorial notebook to compare annotations from different annotators?
To find where two annotators disagree, you could either train a classifier for each annotator and compare them, or you could train a classifier to predict frames where two annotators disagree.