[TBD]Role of semi supervised learning in data science

Introduction

Laymen explanation

Before machine learning models can perform classification tasks, they need to be trained on a lot of annotated examples. Data annotation is a slow and manual process that requires humans reviewing training examples one by one and giving them their right label.

Fortunately, for some classification tasks, you don’t need to label all your training examples. Instead, you can use semi-supervised learning, a machine learning technique that can automate the data-labeling process with a bit of help.

Technical explanation

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training.

Approaches for semi-supervised learning

One way to do semi-supervised learning is to combine clustering and classification algorithms. Clustering is used for labelling data and then these labelled data is used for training classification algorithms
An alternative approach is to train a machine learning model on the labeled portion of your data set, then using the same model to generate labels for the unlabeled portion of your data set. You can then use the complete data set to train an new model.

Performance impact

One of the main reasons for the good performance of BERT on different NLP tasks was the use of Semi-Supervised Learning.

Real life ML algorithms using semi-supervised learning

- BERT NLP uses semi supervised learning

Relevance with neural networks

BERT is a neural networks based ML algorithm. As mentioned above, it uses semi-supervised learning. This paper talks about uses in CNN

Reference

https://en.wikipedia.org/wiki/Semi-supervised_learning

https://algorithmia.com/blog/semi-supervised-learning

https://bdtechtalks.com/2021/01/04/semi-supervised-machine-learning/

https://www.geeksforgeeks.org/explanation-of-bert-model-nlp/

https://ieeexplore.ieee.org/document/8545709

Page updated

Google Sites

Report abuse