Simple (fixed effects) binomial model

We start with a simple binomial model in which we have a series of data outcomes (number of successes x in a series of samples, each with n independent Bernoulli trials.  Examples from wildlife and fisheries might include several samples in which we take n random samples and observe the numbers of males and females in each; or mark n animals with radio transmitters and see how many x survive 90 days. 

In this example we are going to simulate n.samples<-10 binomial samples, each of which involve 100 Bernoulli trials.  In words we're going to simulate under the following assumptions, any of which we can change if we like. The first 2 just  describe our data but don't really say anything about model assumptions

Where we get into the model assumptions are:

These last assumptions allow us to describe the data very simply by a series of binomial distributions, sharing the same parameter p.  As we get into more complex models we will change these assumptions, especially the second one.

In OpenBugs we can describe this model in very simple terms.  First, we will need to put a prior distribution on the parameter p.  A standard one to use is the uniform

p~dunif(0,1)

The sampling model or likelihood now describes the sample outcomes x[i] by binomial distributions:

for(i in 1:n.samples)

{

x[i]~dbin(p,n[i])

}

}

A simple generalization allows p to vary among the samples, so we index p by i as p[i].  This is a a fixed effects model in that we are treating each of the p[i] parameters as a separate effect, unlinked to the other effects.  We will change this shortly when we deal with random effects in which the p[i] come from a common distribution, whose parameters we in turn specify.  

That's about it.  We just need to specify some initial values, then take some samples from the posterior distribution using MCMC. The attached code performs the simulation, conducts the MCMC analysis for the fixed effect models (constant vs. time varying), and the compares the results for the corresponding ML model using the glm procedure.  Compare the 2 and you should see them line up pretty well.

A word on simulating data

Why not just use real data for these examples?   That would be fine, and we could do the Bayes vs. ML comparison as above. But we don't know the actual process that generated these data. By comparison, in the above example, we know that the process is actually a binomial, and further we know the exact value of the parameter of that distribution. This gives us a very strong benchmark against which to compare the results of our analysis, one that would not exist with real data.  We can also do things like generate data under one model (say a more complex one) but estimate the parameters under a simpler model.  If the simpler model gives us results that are still close to the real values, we may be able to say that the model is robust to that particular assumption violation. Simulation is thus a very powerful tool for evaluating how statistical models perform under known conditions. 

Next: Random effects binomial model and Bayesian model selection