Chapter 4: the 'unbiased-estimator' fallacy
Chapter 4: the 'unbiased-estimator' fallacy
It is sometimes suggested that estimators should be chosen to be unbiased. The concept of an unbiased estimator is taken from frequentist statistical practice; unbiased in this context simply means that best-guess estimates of this type will be, in the long run, neither consistently larger nor smaller than the true value you are attempting to estimate.
Indeed, frequentist texts provide something akin to the following proscription:
For practical statistics problems, it is important to determine the unbiased estimator if one exists, since less-than-optimal procedures would naturally be avoided.
This suggestion occurs despite the the obvious fact that, if a measurement results from your loss function, it will naturally fit the criteria you most value (i.e., your loss function).
First note that the very definition of unbiasedness relies on the frequentist notion of an infinite set of repetitions (the underlying meaning of ’in the long run’), which we are likely to reject as unimportant when faced with our current experiment and our current data; after all, should we care that the current estimate, when combined with an infinity of other estimates from similar experiments, will yield a set of symmetrical errors around the true value? There is nothing in the definition of the unbiased estimator that prevents it from being as far as you please from the true parameter value for the single dataset that we care about: our current data. It only tells us that in the unattainable infinite future, and if we repeated our experiment indefinitely into that future, the set of results will be symmetrical around the true value. An example of an unbiased estimator is the mean of a sample drawn from a Gaussian, which provides an estimate of the location of the probability distribution describing that data, and has the further property that the distribution of data means (the estimator), computed from many repeated draws from the same Gaussian, will not be systematically biased relative to the true mean of that distribution.
Now, we have certainly used the mean as an estimate, and it works well – so why all the fuss? The problem is that there is nothing primary about the property of being unbiased – so despite the quote at the opening of this box, there is no reason to expect an unbiased estimator to be optimal. Indeed, even within the frequentist regime, the unbiased estimator is demonstrably not optimal. To see why, we first note that the bias is only one of the two components of the long-run error: there is the expected constant offset (bias) component and the expected variable (variance, or spread) component.
Here, we demonstrate a case where a biased estimator is clearly superior to its unbiased counterpart:
sig=linspace(0,100,202)'; sig=sig(2:end); N=2;
mse=(2*sig.^4)*(N-1).^-1; mse(:,2)=sig.^4*(2*N-1)./N.^2;
figure; plot(sig,mse(:,2),'k-','LineWidth',3)
hold on; plot(sig,mse(:,1),'b-','LineWidth',3); N=1:20;
mse1=(2*sig.^4)*(N-1).^-1; mse2=sig.^4*(2*N-1)./N.^2;
figure; hold on; meshc(N,sig,mse1); meshc(N,sig,mse2); colormap bone