Estimating a binomial probability

The function below, designed for use with R, illustrates the brute-force calculation and statistical performance of a consonant c-box for the binomial rate problem which estimates the binomial rate (i.e., probability) p for a binomial distribution from random sample data. The data are counts of successes from some number of sets of tests, each consisting of N Bernoulli trials. See also the more general problem of estimating both the rate p and the trial size N for a binomial distribution.

The code below implements the simulation in R. You can copy it and paste it directly into the R console. The displayed plausibility function points with its maximum to the value of the binomial rate implied by the data. You can alter the sample size, the true binomial rate and the number of trials underlying each binomial count by changing the numbers at the bottom of the code.

Note that increasing the replications (40 in the code below) used in the calculation makes the estimated plausibility function more smoothly unimodal, but it does not strongly affect the targeting of the estimate to the true answer. That is influenced only by the sample size (here, 8), and secondarily by the number of trials (here, 10), because they are what determines the information content of the data.

#####################################################

# brute-force c-box for the binomial rate problem

#####################################################

# Euclidean metric in mean,sigma-space

d <- function(m,s, mm,ss) return(sqrt((m-mm)^2 + (s-ss)^2))

cbox.binomialp <- function(

x, # data in the form of counts

n = max(x), # (fixed) number of trials that resulted in the counts given in x

many = 400, # number of bootstrap replications

pp = 1 / (1 + exp(rev((-50:50)/8))) # possible p-values to explore

) {

s = length(x) # number of samples

sm = mean(x) # sample mean

ss = sd(x) # sample standard deviation

ii = 1:length(pp)

pls = rep(0,length(pp))

for (i in ii) {

p = pp[i]

# theoretical moments

tm = n * p

ts = sqrt(n * p * (1-p))

# difference between theoretical moments and observed moments

dobserved = d(tm,ts, sm,ss)

dsampling = NULL

for (k in 1:many) {

y = rbinom(s,n,p)

dsampling = c(dsampling, d(tm,ts, mean(y),sd(y)))

}

pls[[i]] = length(dsampling[dobserved <= dsampling])

}

# normalize the plausibilities

pls = pls / many

if (max(pls)<1) cat('Plausibilities do not achieve unity\n')

list(pp=pp, pls=pls / max(pls))

}

# example

n = 8 # sample size

p = 0.6213 # true binomial rate

N = 10 # number of trials

x = rbinom(n, N, p) # x = c(7, 5, 9, 8, 7, 5, 8, 4)

replications = 40

answer = cbox.binomialp(x,N,replications)

plot(answer$pp,answer$pls,ylim=c(0,1),xlab='Probability', ylab='Plausibility',type='b')

abline(v=p)

x

Plausibilities do not achieve unity

[1] 7 5 9 8 7 5 8 4