Spring is Here - we can finally talk about empirical Bayesian estimators

Batting Averages

This is the data from the original James-Stein paper (table 1 from Brad Efron's excellent book). I legitimately only recognized one name in this list - Frank Robinson was the manager for the Washington Nationals. I once made a joke about my lack of knowledge by stating that I didn't know whether Unser was Junior or Senior. I completely forgot that Americans don't know anything about car racing.

The first and second column of numbers are actually the same. The maximum likelihood estimator is simply the analog estimator. The next column, "Actual" refers to the players batting average for the remainder of the season - remember they play something like 160 games a season! The last two columns are two alternative estimators. "JS" refers to the James-Stein estimator and "EB" refers to the non-parametric empirical Bayesian estimator.

James-Stein Estimator

First, the analog estimator or maximum likelihood estimator is simply the observed batting average for each player. The James-Stein estimator was one of the first to acknowledge that there is a relationship between the batting averages of all the players. Let's assume that the observed batting average for each player is distributed normal with the mean equal to the true batting average. Note that the variance is the same for each player. That is, the true batting averages come from the same distribution.

Now we have two pieces information. We know the observed average and we know that ALL the batting averages come from the same distribution. In addition, we know the relationship between the observed batting average and the true batting average.

Efron illustrates that via Bayes' rule, the posterior over the true batting average has mean that is equal to the observed batting average multiplied by "shrinking" factor determined by the variance of the prior distribution of the true batting average. Bayes' rule states that the more variance there is in the prior distribution of batting averages, the more we need to shrink the observed batting average.

Putting the Empirical in Bayes

We don't know the prior distribution. But we already assumed it has just two parameters, the mean and the variance. And, and this is an important and, we observe 18 draws from the distribution. So we have an observed mean of the means and observed variance of the means. The only thing we need to finish the job is the variance of the observed mean conditional on the true mean. This we can determine from the binomial distribution (assuming each at bat is an independent event). Putting all this together gives us the weird James-Stein formula.

Robbins' Proposal

In his amazing 1956 paper, Herbert Robbins, points out that we may be able to non-parametrically estimate the prior. At the very end of the paper he states that all we need to do is solve a mixture model. Unfortunately, he did not know how to do that. But many people have since determined the conditions necessary to solve the mixture model and estimate the prior.

All we need to solve the Robbins' mixture model is to know the likelihood function. That is we need to know the probability of observing the batting average we observe given the true batting average. This is a binomial function (again assuming independence across at-bats. We also need to know the set of true batting averages. This a big set. In fact it is infinite. But it is also closed. A batting average is a probability so it lies between 0 and 1 (or 0 and 1000 for some reason).

We ARE!

Thanks to members of the Penn State statistics department we have a nice estimator for solving mixture models. I repurpose their non-parameteric mixtools estimator.

Using this non-parameter empirical Bayesian estimator via the Penn State algorithm leads to the estimates presented in the "EB" column above.

So how do these methods compare?

> sqrt(mean((mu - x$Actual[1:18])^2))

[1] 0.06478383

> sqrt(mean((mu_js - x$Actual[1:18])^2))

[1] 0.03446235

> sqrt(mean((mu_eb - x$Actual[1:18])^2))

[1] 0.03419436

We have a winner!!!!! The non-parameteric empirical Bayesian estimator is the best predictor of the future.