IFT 6268 - Self-supervised Representation Learning
Fall 2020, A Course offered by the Université de Montréal
This page is a work in progress....
Historical antecedents
Schmidhuber (1990) Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments Tech report. This paper is more about intrinsic curiosity and exploration in RL and not really the sort of self-supervised representation learning we are focussing on in this course.
Vincent et al (2008) Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. JMLR 2008
(many more)
Self-supervised Learning: Engineering tasks for Computer Vision
Dosovitskiy et al. (2014) Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. IEEE Trans. Pattern Analysis and Machine Intelligence
Doersch et al (2015) Unsupervised Visual Representation Learning by Context Prediction ICCV 2015
Pathak et al (2016) Context Encoders: Feature Learning by Inpainting. CVPR 2016
Noroozi and Favaro (2016) Unsupervised learning of visual representations by solving jigsaw puzzles (Jigsaw). ECCV 2016
Zhang et al (2016) Colorful Image Colorization ECCV 2016
Gidaris, Singh and Komodakis (2018) Unsupervised Representation Learning by Predicting Image Rotations ICLR 2018
Analysis of Self-Supervised Methods
Kolesnikov, Zhai and Beyer (2019) Revisiting Self-Supervised Visual Representation Learning. CVPR 2019
Zhai et al (2019) A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark (A GLUE-like benchmark for images) ArXiv 2019
Asano et al (2019) A critical analysis of self-supervision, or what we can learn from a single image ICLR 2020
Contrastive Methods
van den Oord et al. (2018) Representation Learning with Contrastive Predictive Coding (CPC), ArXiv 2018
Hjelm et al. (2019) Learning deep representations by mutual information estimation and maximization (DIM) ICLR 2019
Tian et al. (2019) Contrastive Multiview Coding (CMC) ArXiv 2019
Hénaff et al. (2019) Data-Efficient Image Recognition with Contrastive Predictive Coding (CPC v2: Improved CPC evaluated on limited labelled data) ArXiv 2019
He et al (2020) Momentum Contrast for Unsupervised Visual Representation Learning (MoCo, see also MoCo v2). CVPR 2020
Chen T et al (2020) A Simple Framework for Contrastive Learning of Visual Representations (SimCLR). ICML 2020
Chen T et al (2020) Big Self-Supervised Models are Strong Semi-Supervised Learners (SimCLRv2) ArXiv 2020
Caron et al (2020) Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (SwAV) ArXiv 2020
Xiao et al (2020) What Should Not Be Contrastive in Contrastive Learning ArXiv 2020
Misra and van der Maaten (2020) Self-Supervised Learning of Pretext-Invariant Representations. CVPR 2020
Generative Methods
Dumoulin et al (2017) Adversarially Learned Inference (ALI) ICLR 2017
Donahue, Krähenbühl and Darrell Adversarial Feature Learning (BiGAN, concurrent and similar to ALI) ICLR 2017
Donahue and Simonyan (2019) Large Scale Adversarial Representation Learning (Big BiGAN) ArXiv 2019
Chen et al (2020) Generative Pretraining from Pixels (iGPT) ICML 2020
Bootstrap Your Own Latents (BYoL)
Tarvainen and Valpola (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 2017
Grill et al (2020) Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (BYoL). ArXiv 2020
Abe Fetterman, Josh Albrecht, (2020) Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL) Blog post
Schwarzer and Anand et al. (2020) Data-Efficient Reinforcement Learning with Momentum Predictive Representations. ArXiv 2020
Self-distillation Methods
Furlanello et al (2017) Born Again Neural Networks. NeurIPS 2017
Yang et al. (2019) Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students. AAAI 2019
Ahn et al (2019) Variational information distillation for knowledge transfer. CVPR 2019
Zhang et al (2019) Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation ICCV 2019
Müller et al (2019) When Does Label Smoothing Help? NeurIPS 2019
Yuan et al. (2020) Revisiting Knowledge Distillation via Label Smoothing Regularization. CVPR 2020
Zhang and Sabuncu (2020) Self-Distillation as Instance-Specific Label Smoothing ArXiv 2020
Mobahi et al. (2020) Self-Distillation Amplifies Regularization in Hilbert Space. ArXiv 2020
Self-training / Pseudo-labeling Methods
Xie et al (2020) Self-training with Noisy Student improves ImageNet classification. CVPR 2020
Sohn and Berthelot et al. (2020) FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. ArXiv 2020
Chen et al. (2020) Self-training Avoids Using Spurious Features Under Domain Shift. ArXiv 2020
Iterated Learning / Emergence of Compositional Structure
Ren et al. (2020) Compositional languages emerge in a neural iterated learning model. ICLR 2020
Guo, S. et al (2019) The emergence of compositional languages for numeric concepts through iterated learning in neural agents. ArXiv 2020
Cogswell et al. (2020) Emergence of Compositional Language with Deep Generational Transmission ArXiv 2020
Kharitonov and Baroni (2020) Emergent Language Generalization and Acquisition Speed are not tied to Compositionality ArXiv 2020
Natural Language Processing
Peters et al (2018) Deep contextualized word representations (ELMO), NAACL 2018
Devlin et al (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (BERT) NAACL 2019
Brown et al (2020) Language Models are Few-Shot Learners (GPT-3, see also GPT-1 and 2 for more context) ArXiv 2020
Clark et al (2020) ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ICLR 2020
He and Gu et al. (2020) REVISITING SELF-TRAINING FOR NEURAL SEQUENCE GENERATION (Unsupervised NMT) ICLR 2020
Video / Multi-modal data
Wang and Gupta (2015) Unsupervised Learning of Visual Representations using Videos ICCV 2015
Misra, Zitnick and Hebert (2016) Shuffle and Learn: Unsupervised Learning using Temporal Order Verification ECCV 2016
Lu et al (2019) ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS 2019
Hjelm and Bachman (2020) Representation Learning with Video Deep InfoMax. (VDIM) Arxiv 2020
The role of noise in representation learning
Bachman, Alsharif and Precup (2014) Learning with Pseudo-Ensembles NeurIPS 2014
Bojanowski and Joulin (2017 ) Unsupervised Learning by Predicting Noise. ICML 2017
Self-supervised learning for RL, control and planning
Pathak et al. (2017) Curiosity-driven Exploration by Self-supervised Prediction (see also a large-scale follow-up) ICML 2017
Aytar et al. (2018) Playing hard exploration games by watching YouTube (TDC) NeurIPS 2018
Anand et al. (2019) Unsupervised State Representation Learning in Atari (ST-DIM) NeurIPS 2019
Sekar and Rybkin et al. (2020) Planning to Explore via Self-Supervised World Models. ICML 2020
Schwarzer and Anand et al. (2020) Data-Efficient Reinforcement Learning with Momentum Predictive Representations. ArXiv 2020
SSL Theory
Arora et al (2019) A Theoretical Analysis of Contrastive Unsupervised Representation Learning. ICML 2019
Lee et al (2020) Predicting What You Already Know Helps: Provable Self-Supervised Learning ArXiv 2020
Tschannen, et al (2019) On mutual information maximization for representation learning. ArXiv 2019.
Unsupervised Domain Adaptation
Shu et al (2018) A DIRT-T APPROACH TO UNSUPERVISED DOMAIN ADAPTATION. ICLR 2018
Wilson and Cook (2019) A Survey of Unsupervised Deep Domain Adaptation. ACM Transactions on Intelligent Systems and Technology 2020.
Mao et al. (2019) Virtual Mixup Training for Unsupervised Domain Adaptation. CVPR 2019
Vu et al. (2018) ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation CVPR 2019
Scaling
Kaplan et al (2020) Scaling Laws for Neural Language Models. ArXiv 2020