Harnessing small projectors and multiple views for efficient vision pretraining
Arna Ghosh*, Kumar Krishna Agrawal*, Shagun Sodhani, Adam Oberman, Blake Richards
Harnessing small projectors and multiple views for efficient vision pretraining
Arna Ghosh*, Kumar Krishna Agrawal*, Shagun Sodhani, Adam Oberman, Blake Richards
We pretrain on Imagenet-100 for 100 epochs with a multi-view Barlow Twins learning objective and a varying number of augmentations. Notably, more augmentations significantly improve the convergence of the model, requiring significantly fewer gradient steps to achieve similar performance as the 2 augmentations case.
With more augmentations, our approach can recover the same performance while using significantly less unique 'real' samples. In particular, with 4 augmentations on 50% dataset for pretraining, we can match the performance when pretraining with 2-augs using full dataset. All evaluations are using linear probes with ResNet-18 backbones
Comparing the learning dynamics of Barlow Twins objective on two ends of the spectrum, with low beta (weaker orthogonalization constraint, left), and high beta (strong orthogonalization constraint, right). At a high level this verifies the claims in Theorem 3.2