Assume your input dataset contains one column with values ranging from 0 to 1, and another column with values ranging from 10,000 to 100,000. The great difference in the scale of the numbers could cause problems when you attempt to combine the values as features during modeling. If you are interested to know about the reason, then this document helps
Feature scaling is a technique often applied as part of data preparation for machine learning. The goal of Feature scaling is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values.
When all features are on similar scale, then the ML algorithm training converges faster.
The result of standardization is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with μ=0 and σ=1
where μ is the mean (average) and σ is the standard deviation.
In this approach, the data is scaled to a fixed range — usually 0 to 1.
In contrast to standardization, the cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers. Thus MinMax Scalar is sensitive to outliers.
A Min-Max scaling is typicallydone via the following equation:
Machine learning models require all input and output variables to be numeric. So, If your data contains non-numeric features (for example, pincode, address), Firstly, you need to encode it to numbers. One hot encoding is one such approach. Please refer here for more detail.
It alters the mean value and variance of dataset.
Similarly, normalisation can impact co-variance value among features. It happens since not all features are normalised in the same scale.
Those cases where Mean and variance of dataset should be preserved, feature scaling should be avoided. This paper talks about this.
Some algorithms take care of scaling itself. For example, Random Forests doesn't require future scaling since it doesn't care about absolute value. Similarly, linear regression takes care of scaling via coefficient adjustment(Ref: colab example).
If you select right distance metric for K-NN classifier, then scaling is not needed. For example, Mahalanobis distance metric takes care of scaling automatically(note covariance matrix Σ in the formula). Refer example code here.
Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks. Refer here for the detail.
Input feature scaling is needed to avoid under-saturation/over-saturation problem due to activation function in neural networks. For example, ReLu activation functions outputs 0 for -ve values. So, an input normalisation should be such a way that it should avoid the negative value to be passed in ReLu. In a 2015 paper,7 Sergey Ioffe and Christian Szegedy proposed a technique called Batch Normalization (BN) to address the vanishing/exploding gradients problems.
Some algorithms require that data be normalized before training a model. Other algorithms takes care of itself.
Therefore, when you choose a machine learning algorithm to use in building a predictive model, be sure to review the data requirements of the algorithm before applying normalization to the training data.
Refer here for the case when pre-normalisation is not needed.
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/normalize-data
https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
https://www.youtube.com/watch?v=DtEq44FTPM4
https://youtu.be/0HOqOcln3Z4?t=534
https://youtu.be/0HOqOcln3Z4?t=548
http://theprofessionalspoint.blogspot.com/2019/02/which-machine-learning-algorithms.html
https://datascience.stackexchange.com/questions/62031/normalize-standardize-in-a-random-forest
http://theprofessionalspoint.blogspot.com/2019/02/which-machine-learning-algorithms.html
https://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work
https://images.app.goo.gl/7Vr3Di2T2dsoVPns9
https://en.wikipedia.org/wiki/Mahalanobis_distance
https://datascience.stackexchange.com/questions/25832/input-normalization-for-relu
https://gist.github.com/ajeyjoshi/e74e8c7f8bd389195efe163d1ab5bdc4
https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-batch-normalisation-in-machine-learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-batch-normalisation-in-machine-learning#TOC-Role-in-machine-learning
https://www.linkedin.com/posts/dpkumar_machinelearning-datascience-features-activity-6760734531889831936-ZiID