Bias-Variance Trade-Off

04/08/2018

In statistical learning, one of the most important topics is underfitting and overfitting. They are important because they explain the state of a model based on its performance. The best way to understand these terms is to see them as a trade-off between the bias and the variance of the model.

The term overfitting refers to a model that fits very well to the data with which it is trained, but it poorly generalizes them, meaning that when faced with values other than those of training they are predicted with low precision.

On the other hand underfitting refers to the opposite state, which means that the model does not fit well even to the data it is trained with.

Img. 1: Illustration of undefitting and overfitting. From Grus, J. (2015). Data science from scratch: first principles with python. " O'Reilly Media, Inc.".

In the previous example we can see an application of both terms, for the 10 points of the plane, the line of degree 0 (constant) is an example of underfitting. The line of grade 9 passes perfectly through all the points and is an example of overfitting. Finally, the line of grade 1 would be an example of a model that correctly generalizes the data.

Bias-Variance

The best way to understand the problem of underfittig and overfitting is to express it in terms of bias and variance.

A model is said to have high bias when its structure does not describe the data model. A linear model will always have low performance if it is used in a non-linear data set, no matter how much data is used, the model will always have low performance. Bias is the error that occurs when trying to approximate the behavior of a problem's data.

When a model on the other hand has high variance it describes very well the training data but at the moment of training with a different dataset it produces a very different model from the previous one and therefore a bad result at the moment of predicting. The variance is the amount by which the model will change with a different training set.

Img. 2: Graphical illustration of bias and variance From Understanding the Bias-Variance Tradeoff, by Scott Fortmann-Roe.

A model with low variance and low bias is the ideal model (grade 1 line).

A model with low bias and high variance is a model with overfitting (grade 9 line).

A model with high bias and low variance is usually an underfitting model (grade 0 line).

A model with high bias and high variance is the worst case scenario, as it is a model that produces the greatest possible prediction error.

The mathematical equation that explains this relationship is:

Img. 3: Bias-variance trade-off. From James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.

The expression on the left is the mean square of the errors in the model predictions (MSE).

With the expression on the right we can see that this error is explained by the square of the model bias, the variance of the model and the variance of the inevitable errors.

Conclusion

In conclusion, the bias-variance trade-off allows us to understand the reason why a model has a certain behavior and allows us to apply corrective actions. When a model has a high bias it means that it is very simple and that adding more features should improve it. For high variance models an alternative is feature reduction, but including more training data is also a viable option.

As a general rule, the more flexible a model is, the higher its variance and the lower its bias. The less flexible a model is, the lower its variance and the higher its bias.

References

[1] Grus, J. (2015). Data science from scratch: first principles with python. " O'Reilly Media, Inc.".

[2] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.