Neural Representations of Dynamic Visual Stimuli
Sample Animated Videos
Encoding Models Voxel-wise Prediction Performance on Inflated Cortex
Here we show the voxel-wise fMRI prediction performance, quantified as the Pearson correlation (r) between measured and predicted responses, for the remaining visual encoding models in alphabetical order. We include the HCP-MMP Parcellation Map on an inflated cortex for reference here.
Note that all following plots of encoding accuracy on inflated cortical maps are on the same scale for ease of comparison.
AF (averaged frames) models consist of model features averaged over frames spanning the video.
CLIP
CLIP AF (averaged frames)
CLIP ConvNeXt (Best Performing Image Encoder)
CLIP ConvNeXt AF (averaged frames)
DINOv1
DINOv2
Hiera Base Plus
Hiera Huge
R3M
R3M AF (averaged frames)
ResNet-50
VC-1 (Best Performing Embodied AI Model)
VC-1 AF (averaged frames)
VideoMAE
VideoMAE Large (Best Performing Model)
VIP
VIP AF (averaged frames)
XCLIP
HCP-MMP Parcellation Map
Source: Rolls et al., "The human language effective connectome," NeuroImage, 2022. DOI.