Harnessing small projectors and multiple views for efficient vision pretraining

Harnessing small projectors and multiple views for efficient vision pretraining
Arna Ghosh*, Kumar Krishna Agrawal*, Shagun Sodhani, Adam Oberman, Blake Richards

Multi-View Barlow Twins on Imagenet-100

We pretrain on Imagenet-100 for 100 epochs with a multi-view Barlow Twins learning objective and a varying number of augmentations. Notably, more augmentations significantly improve the convergence of the model, requiring significantly fewer gradient steps to achieve similar performance as the 2 augmentations case.

Multi-View Barlow Twins on Imagenet-100 with fractional dataset

With more augmentations, our approach can recover the same performance while using significantly less unique 'real' samples. In particular, with 4 augmentations on 50% dataset for pretraining, we can match the performance when pretraining with 2-augs using full dataset. All evaluations are using linear probes with ResNet-18 backbones

Visualizing the Learning Dynamics of Barlow Twins

Comparing the learning dynamics of Barlow Twins objective on two ends of the spectrum, with low beta (weaker orthogonalization constraint, left), and high beta (strong orthogonalization constraint, right). At a high level this verifies the claims in Theorem 3.2

Page updated

Google Sites

Report abuse