Laymen explanation
Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch. This has the impact of settling the learning process and drastically decreasing the number of training epochs required to train deep neural networks.
If you like to know more about this, then this document helps.
Batch Normalization is a supervised learning technique that converts interlayer outputs into of a neural network into a standard format, called normalizing. This effectively 'resets' the distribution of the output of the previous layer to be more efficiently processed by the subsequent layer.
In neural networks, the output of the first layer feeds into the second layer, the output of the second layer feeds into the third, and so on. When the parameters of a layer change, so does the distribution of inputs to subsequent layers.
This is called Internal covariate shift.
Normalization is a procedure to change the value of the numeric variable in the dataset to a typical scale, without misshaping contrasts in the range of value.
Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch.
In a 2015 paper,7 Sergey Ioffe and Christian Szegedy proposed a technique called Batch Normalization (BN) to address the vanishing/exploding gradients problems.
It allows to use faster learning rates since normalization ensures there’s no activation value that’s too high or too low, as well as allowing each layer to learn independently of the others. When applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Covariate shift within neural network is reduced
It removes the need for normalizing the input data since the first hidden layer will take care of that, provided it is batch-normalized.
There is a runtime penalty: the neural network makes slower predictions due to the extra computations required at each layer. So if you need predictions to be lightning-fast, you should try with alternate approaches and consider this tradeoff
https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
https://towardsdatascience.com/batch-normalisation-in-deep-neural-network-ce65dd9e8dbf
https://machinelearning.wtf/terms/internal-covariate-shift/
https://images.app.goo.gl/CvzhdTMbrfTwDXPT6
https://images.app.goo.gl/1cyFxUf58mqCxrRCA
https://deepai.org/machine-learning-glossary-and-terms/batch-normalization
https://towardsdatascience.com/batch-normalization-explained-algorithm-breakdown-23d2794511c
https://www.linkedin.com/posts/dpkumar_batchnormalisation-neuralnetworks-machinelearning-activity-6780291149061074944-O-Sy