biasvariancedilemma

Bias Variance Dilemma

Regression is one of the most often applied problems in engineering, we have a true unknown function F: X -> Y which we are expected to find, given some data (x, y) which satisfy F. In Linear Regression, we find a straight line (f) or a hyper plane (in general) which minimizes the squared error[1]. We can even go for quadratic or polynomial functions which may possibly minimize the squared error and even be able to reach zero error. But the real problem here is can this function(f) that we have estimated, approximate the true function F well; i.e., for new data does our prediction on x using the function f and the true value F(x), are they both same, what is the bias or the mismatch or misalignment. Certainly the bias measures the accuracy or the quality of match: high bias implies poor match. Another way of measuring the “match” or “alignment” is the variance; the variance (in our prediction errors) measures the precision or specificity of the match: a high variance implies a weak match f of F. We can adjust the bias and variance of our function f by choice in the form of f; but the important bias-variance relations shows that the two terms are not independent, in fact for a given mean-square error, they obey a form of “conservation law” which is mse = bias^2 + variance (for all training sets of a particular cardinality {S/ |S| = n and S={(x,y)/ x is randomly chosen from X and F(x) = y}} ). The bias-variance dilemma or bias-variance trade-off is a general phenomenon, the more complex (more free parameters) the form of f we assume to adapt to the training set the lower the bias will be but higher the variance (see figure below). The right weights to be given for bias and variance, is a hard problem.

NOTE

[1] Squared error here is (f(x) – F(x))^2; which is vertical (or parallel) to the Y axes, unlike in the case of principal component analysis where the error represents the squared magnitude of the perpendicular drop of F(x) on to the principal components.

References:

1 Duda, R. O., Hart, P. E., and Stork, D. G., 2000. Pattern Classification, 2 ed. John Wiley & Sons ASIA Pte Ltd, November.

Any corrections/suggestions/missed-references please don't hesitate to mail me.

Page updated

Google Sites

Report abuse