Role of batch normalisation in machine learning

Introduction

Laymen explanation

Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch. This has the impact of settling the learning process and drastically decreasing the number of training epochs required to train deep neural networks.

If you like to know more about this, then this document helps.

Technical explanation

Batch Normalization is a supervised learning technique that converts interlayer outputs into of a neural network into a standard format, called normalizing. This effectively 'resets' the distribution of the output of the previous layer to be more efficiently processed by the subsequent layer.

Jargons

Internal covariate shift

In neural networks, the output of the first layer feeds into the second layer, the output of the second layer feeds into the third, and so on. When the parameters of a layer change, so does the distribution of inputs to subsequent layers.

This is called Internal covariate shift.

Normalisation vs Batch normalisation

Normalization is a procedure to change the value of the numeric variable in the dataset to a typical scale, without misshaping contrasts in the range of value.

Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch.

Benefits

In a 2015 paper,7 Sergey Ioffe and Christian Szegedy proposed a technique called Batch Normalization (BN) to address the vanishing/exploding gradients problems.
- It allows to use faster learning rates since normalization ensures there’s no activation value that’s too high or too low, as well as allowing each layer to learn independently of the others. When applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Covariate shift within neural network is reduced

Role in machine learning

It removes the need for normalizing the input data since the first hidden layer will take care of that, provided it is batch-normalized.

Point to remember

There is a runtime penalty: the neural network makes slower predictions due to the extra computations required at each layer. So if you need predictions to be lightning-fast, you should try with alternate approaches and consider this tradeoff

Reference

https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291

https://towardsdatascience.com/batch-normalisation-in-deep-neural-network-ce65dd9e8dbf

https://machinelearning.wtf/terms/internal-covariate-shift/

https://images.app.goo.gl/CvzhdTMbrfTwDXPT6

https://images.app.goo.gl/1cyFxUf58mqCxrRCA

https://deepai.org/machine-learning-glossary-and-terms/batch-normalization

https://towardsdatascience.com/batch-normalization-explained-algorithm-breakdown-23d2794511c

https://www.linkedin.com/posts/dpkumar_batchnormalisation-neuralnetworks-machinelearning-activity-6780291149061074944-O-Sy

Page updated

Google Sites

Report abuse