Semi-supervised learning is a machine learning paradigm that leverages both labeled and unlabeled data during the training process. This approach aims to combine the benefits of supervised learning, where models are trained on labeled data, with the advantages of unsupervised learning, where models learn patterns from unlabeled data. Semi-supervised learning is particularly useful when obtaining labeled data is expensive or time-consuming.
Self-Training - Iteratively trains a model on the labeled data and then uses the model to predict labels for unlabeled data, gradually expanding the labeled dataset.
Co-Training - Trains multiple models on different subsets of features or representations, allowing them to share information and learn from each other on both labeled and unlabeled data.
Multi-View Learning - Considers different views or representations of the data, training the model on diverse perspectives to improve generalization performance.
Tri-Training - Extends co-training by training three models, and each model acts as a "teacher" for the other two models on a subset of the data, facilitating mutual learning.
Generative Models - Generates synthetic labeled data, expanding the labeled dataset for training (e.g.: Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs)).
Bootstrapping - Assigns pseudo-labels to unlabeled data based on the model's confidence, treating these pseudo-labeled instances as additional labeled data during training.
Graph-Based Methods
Label Propagation - Propagates labels from labeled nodes to unlabeled nodes in a graph, considering the connectivity between data points.
Manifold Regularization - Exploits the assumption that data points close to each other in the feature space share the same label, incorporating manifold structure for regularization.
Active Learning - Incorporates human feedback by selecting the most informative instances for labeling, improving model performance with a limited number of labeled examples.