Recent advancements in machine learning (ML) have been significantly propelled by large-scale foundation models across various domains. At the heart of these developments is Self-supervised Learning (SSL), where models are pretrained on vast amounts of unlabeled data, endowing them with essential capabilities. This pretraining-finetuning paradigm represents a fundamental shift in ML, diverging from traditional empirical risk minimization (ERM) frameworks that assume identical training and test data distributions. This mismatch calls for new ML theories to understand and enhance foundation models.
This tutorial aims to bridge the gap by providing a comprehensive overview of SSL principles and methodologies utilized in foundation models. It aims to cover 1) representative SSL methodologies in foundation models, 2) the theoretical principles and frameworks for analyzing SSL methods, and 3) advanced SSL topics and phenomena, such as, equivariant SSL, in-context learning, scaling laws, and feature interpretability. The tutorial concludes with a panel discussion featuring prominent researchers, addressing future directions and theoretical underpinnings of SSL in foundation models. This systematic tutorial aims to equip attendees with a solid understanding of modern SSL techniques and their foundational principles, fostering further advancements in the field.
The main content of this tutorial has three parts: 1) methods, 2) principles, and 3) frontiers. [Materials will be released soon]
Part I: Overview of the roles of SSL in foundation models
Foundation models and SSL: Give an overview of existing foundation models across multiple domains and the central roles of self-supervised learning.
A historical remark on paradigm shift: Brief overview of unsupervised learning in traditional ML (pre-2012) and deep learning (2012-now). A comparison between traditional ML and the foundation model paradigm and its impact on ML theory.
Part II: Basic SSL Methodologies in Foundation Models
Joint embedding methods: We will cover three types of joint embedding methods: 1) contrastive methods (InfoNCE, SimCLR, MoCo, CLIP), 2) non-contrastive methods (BYOL / SimSiam, SwaV / DINO), 3) regularization methods (Barlow Twins / VICReg). Afterward, we will introduce their theoretical connections and give a unified comparison.
Generative methods: We will introduce two major types of generative methods: 1) reconstruction methods (e.g., BERT, MAE, BEiT, data2vec, MAGE) and 2) autoregressive methods (GPT, XLNet, DALLE, LVM). We discuss their connection to previous methods as well as the pros and cons.
Part III: Basic SSL Principles in Foundation Models
Information theory: We will introduce the connection between SSL and information theory, including the information maximization principle and variational bounds (InfoNCE, InfoMax, MINE, etc), and its connection to various SSL methods.
Spectral graph theory: We will introduce the augmentation graph framework with connection to spectral graph theory. We then introduce how to analyze joint embedding and generative SSL methods in a unified augmentation graph framework.
Part IV: Advanced SSL Frontiers in the Foundation Model Era
Equivariant SSL (10 min): An introduction of recent progress in equivariant SSL that extends the invariant learning principle to be aware of input transformations.
In-context SSL (10 min): Introduce In-context Learning as a core emergent property of SSL pretraining in foundation models, and representative theories on this topic.
Scaling law of SSL (10 min): Scaling law is an empirical law of SSL that is important for developing foundation models. We will briefly introduce it and the related theories.
Yisen Wang is an Assistant Professor at Peking University. His research interest is broadly Representation Learning, focusing on extracting meaningful representation from various data types, including unlabeled, noisy, adversarial, and graph data. He has received the Best Paper Award of ECML-PKDD 2021 and the Silver Best Paper Award at ICML 2021 Workshop. He has served as Senior Area Chair of NeurIPS, and Area Chairs of NeurIPS, ICML, ICLR, and CVPR.
Yifei Wang is a postdoc at MIT CSAIL, working with Prof. Stefanie Jegelka. His research focuses on bringing theoretical perspectives to understand foundation models (such as LLMs) and to design theory-driven algorithms. His work has been recognized with 3 best paper awards at ECML and ICML workshops, along with multiple oral and spotlight presentations at NeurIPS, ICLR, and AAAI. His PhD thesis, centered on the theory and method of self-supervised learning, was honored with the CAAI Outstanding Ph.D. Dissertation Runner-Up Award.
Qi Zhang is a PhD student at Peking University. His research interests include revealing the mechanisms of self-supervised learning with theoretical analysis and designing new self-supervised methods based on theoretical insights. He published and presented 7 papers in NeurIPS, ICLR, and ICML, where he established the first theoretical guarantees for the generalization of MAE methods and multi-modal contrastive learning. He served as a reviewer for NeurIPS, ICLR, and ICML.