B. Poisson parameters
In the following example we illustrate the process discussed in part A for use of data to estimate parameters of a discrete Poisson distribution model. This data might cover for example home runs in a baseball game, fights in a bar on Saturday night, or red Volkswagens at a stoplight i.e. basically anything with many chances to happen but a small number of happenings on average.
I spent quite a bit of time with this in my search for solar flare tracks in interplanetary dust particles collected in the earth's stratosphere. In that application the selection of a prior-probablity was especially important.
For instance how would the expected track density be affected if, after years of work on a dozen nanogram-sized particles with two whole square microns of crystalline area, you managed to unambiguously detect a grand total of no tracks? Aside: This story had a happy ending, because further work resulted in their unambiguous identification.
steps to turn a distribution on its ear
First, come up with a likelihood model i.e. a model for the probability of your data being generated by a parameterized model-system e.g. a binomial-process with probability p or a Poisson-process with mean value μ.
Poisson-processes (i.e. processes which generate an occasional event that might have happened at many different points) are ubiquitous. For a Poisson-process H with mean-value (or event-rate per unit-interval) μ, the likelihood of measuring m events in that interval is:
.
You can always check if this likelyhood is well-behaved by making sure that it adds up to 1 when summed (or integrated) over all possible observation values (in this case all non-negative integer values of m). Summing event-number m (and m2) times this probability over the range of m-values says that a rate of μ events per interval predicts a detected-event average±std-dev of <m>±σm = μ±√μ.
Secondly, use any pre-observation information that you have on model-parameters to write out a prior-probability (or weight) for the various parameter-values. If you don't have prior information, a natural alternative is to use symmetry[8][9][10] e.g. for offset parameters like x that can run between -∞ and +∞ you might use a uniform prior-probability weight of 1 while for scale-parameters like rate μ that only run between 0 and +∞ you might use a Jeffrey's prior-probability weight of 1/μ. These weights often work, but are improper (i.e. not normalizable as probabilities), unless they are set to zero beyond a certain point at both ends of their range.
Symmetry of our always positive mean event-rate μ suggests that for our Poisson-process example we might adopt an improper Jeffreys-prior weight[11] of 1/μ. In other words:
.
As long as at-least one event has been detected, range-limits on this prior-probability assignment to make it normalizable may not be needed.
Thirdly, integrate (or sum) your prior-weight times that likelihood function over all possible parameter values. This global-likelihood is (by Bayes theorem) a normalization-constant that, divided into your product of prior × likelihood yields the posterior-probability (or differential probability for continuously-distributed parameters) of each possible parameter-set value for a given set of observational-data.
For our Poisson-process example, this says given an interval observation of m counts that the differential-probability (per unit μ) for a mean interval count-rate μ is plausibly:
.
Integrating rate-parameter μ (and μ2) times this differential probability over the range of μ-values says that m observed events predicts a rate-parameter average±std-dev of <μ>±σμ = m±√m.
Finally this posterior-probability for model fit-parameter values can be used to directly predict the outcome of future observations, and serve as a more informed prior for incorporating future observations into the parameter-determination process. Predicting future observations, for instance, may be done by marginalizing the original likelihood over the posterior-distribution of possible parameter values.
For our Poisson-model example this suggests that the probability of an observation of m2 counts in the next interval, given an observation of m1 in the first, is (provided m1 > 0) therefore:
.
In other words, the model-probability of seeing 3 events in the second interval after seeing 5 events in the first is p[3|5,H] = 35/256 ≈ 0.136719. If no objects were detected in the first interval, i.e. if m1 = 0, a more careful look at plausible upper/lower limits on the Jeffreys prior may be warranted.
Should continued observations not track such expectations, then a new model may need to be selected. Quantitative approaches to model-selection (in life as well as in physical sciences) as distinct from parameter-estimation are discussed elsewhere in this web space.
Related references:
P. Fraundorf (1981) "Interplanetary dust in the transmission electron microscope: Diverse materials from the early solar system" Geochim. Cosmochim. Acta 45, 915-943 (Review in Nature about naming rocks, small minds, etc.).
J. P. Bradley, D. E. Brownlee and P. Fraundorf (1984) "Discovery of nuclear tracks in interplanetary dust", Science 226, 1432-1434.
Harold Jeffreys (Sep. 14, 1946) "An Invariant Form for the Prior Probability in Estimation Problems", Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, Vol. 186, No. 1007, pp. 453-461 link.
Jaynes, E. T., 1968, `Prior Probabilities,' IEEE Trans. on Systems Science and Cybernetics, SSC-4, 227 link.
Jaynes, E. T. (1973) "The Well-Posed Problem", Foundations in Physics, 3, 477 link.
Sir Harold Jeffreys may actually cite an earlier source for the dσ/σ approach to scale-parameters, although it is commonly referred to as a Jeffreys prior in the maximum-entropy community e.g. by text author Phil Gregory.