This page links to on-line resources for confidence boxes and Quick Bayes, a kind of robust Bayes analysis that uses these structures in place of uninformative priors. It also includes code in the R programming language that implements possible solutions to several of the challenge problems, along with pictures of the generated solutions. A file of all of these implementations is linked at the bottom of this page. The R code assumes that the file pba.r, also linked at the bottom of this page, has been loaded via R's source function. The syntax of these codes would be slightly different if written for Matlab or Risk Calc.
The c-boxes that characterise inferential uncertainty about a probability value p (which is also known as the binomial rate) are shown below as binary data are collected. The c-boxes are shown in blue.
These graphs are the result of a Monte Carlo simulation based on a true value for p of 0.4 which is depicted by the dotted line in each graph. The simulation involves the accumulation of 24 binary observations (zeros and ones), and each subsequent graph shows the c-box that would be obtained as each new observation arrives.
The first graph in the upper, left-hand corner depicts the 'dunno' interval corresponding to total ignorance about p before any data are collected. This is the c-box for p in the uninformative case before any observations are made.
When the first binary observation arrives, it is a one. This is represented by the green line on the right side of the second graph at one. The vacuous dunno interval then contracts to the triangular c-box. The second binary observation is a zero, and causes further contraction of the resulting c-box.
After 24 observations, the c-box approaches the beta distribution that a traditional Bayesian analysis based on a Jeffreys prior would yield for the same data. Asymptotically, it approaches the step function at the value p = 0.4.
Click on the display below to make it slightly larger.
Here is the code that created the graphs above:
source('pba.r')
p = 0.4
k = n = 0
par(mfrow=c(5,5))
for (i in 1:25) {
b = kn(k, n) # compute the c-box, this function is in pba.r
plot(b, col='blue', xlim=c(0,1), lwd=2, xaxt='n', yaxt='n', ann=FALSE)
if (1<i) abline(v=o,col='green',lwd=2)
abline(v=p,col='lightgray',lty='dotted')
o = rbinom(1,1,p)
k = k + o
n = n + 1
}
You can run this code in R yourself with different values of p. Note that your simulation will have different binary data.
For the purposes of comparison, we show alternative solutions that might be obtained using the method of matching moments and the maximum likelihood method as well a couple of c-box approaches for this problem.
In all the graphs below, c-boxes are shown with red and black curves. The red curve represents the left bound of the c-box, and the black curve represents the right bound of the c-box. In several cases, even precise probability distributions are also shown with the two-color scheme; in this case any space between the red and black curves represents the effect of discretisation error.
The data are superimposed as empirical distributions (step functions) on the concentration and body mass graphs.
The threshold of concern, 5 ÎĽg/day/kg, is shown as a green vertical line, against which the H output should be compared.
The following is code used for all the various approaches and also for subproblems 1.1-1.5.
source('pba.r')
c = c(12.1, 6.45, 73, 24.6, 15.2, 44.3, 19.0)
m = c(58.22, 57.25, 53.23, 43.91, 42.69, 46.23, 43.08, 46.06, 44.6, 47.55, 53.91, 63.05, 43.62, 48.26, 62.83, 41.3, 39.05, 38.66, 55.3, 45.19, 68.7)
imean = 2.5
istd =0.5
showCMIH <- function(h) {
par(mfrow=c(2,2))
plot(C,xlim=c(0,150))
edf(c)
title('Concentration (ÎĽg per liter)')
plot(M)
edf(m)
title('Body mass (kg)')
plot(I)
title('Intake (liters per day)')
plot(H,xlim=c(0,12),cumulative=FALSE)
title(paste('H =',sayint(h),'ÎĽg/day/kg'))
abline(v=5, col='green')
if (class(h)=='interval') for (i in c(h@lo,h@hi)) lines(c(-1,0.7),rep(i,2),col='blue') else lines(c(-1,0.7),rep(h,2),col='blue')
}
Method of matching moments, assuming independence
C = MMlognormal(c) # C = lognormal(mean(c), std(c))
M = MMnormal(m) # M = normal(mean(m), std(m))
I = normal(imean, istd)
H = C * I / M
showCMIH(1 - prob(H,5)) # probability that 5 < H
The code above produces the graphs below.
Maximum likelihood, assuming independence
C = MLlognormal(c) # mu=sum(log(x))/n;C=lognormal(meanlog=mu,stdlog=sum((log(x)-mu)^2)/n)
M = MLnormal(m) # M=normal(mean(m),std(m))
I = normal(imean, istd)
H = C * I / M
showCMIH(1-prob(H,5)) # probability that 5 < H
Probability and confidence boxes, assuming independence
C = CBlognormal(c)
M = CBnormal(m)
I = normal(imean, istd)
H = C * I / M
showCMIH(1 - prob(H,5)) # probability that 5 < H
Probability and confidence boxes, assuming nothing about the Intake-Mass dependence
C = CBlognormal(c)
M = CBnormal(m)
I = normal(imean, istd)
H = C * (I %/% M)
showCMIH(1 - prob(H,5)) # probability that 5 < H
Probability and confidence boxes, assuming Intake and Mass have perfect dependence
C = CBlognormal(c)
M = CBnormal(m)
I = normal(imean, istd)
H = C * (I %///% M) # same as perfectconv(I, M, '/')
showCMIH(1 - prob(H,5)) # probability that 5 < H
Probability and confidence boxes, assuming Intake and Mass have positive dependence
The code below is not supported in pba.r.
C = CBlognormal(c)
M = CBnormal(m)
I = normal(imean, istd)
H = C * positiveconv(I, M, '/')
showCMIH(1 - prob(H,5)) # probability that 5 < H
C = env(0, CBlognormal(c))
M = CBnormal(m)
I = normal(imean, istd)
H = C * (I %///% M) # same as perfectconv(I, M, '/')
showCMIH(1 - prob(H,5)) # probability that 5 < H
mean.list<-function(x){s=interval(0,0); for (i in 1:length(x))s=s+x[[i]]; return(s/length(x))}
cc = ifelse(20<c,c,interval(0,20))
C = env(0, CBlognormal(c))
M = CBnormal(m)
I = normal(imean, istd)
H = C * (I %///% M) # same as perfectconv(I, M, '/')
showCMIH(1 - prob(H,5)) # probability that 5 < H
# setdefault(copula, 'normal')
# H = C * convolve(I, M, 0.4, '/') # the general convolve function is not available in pba.r
# showCMIH(1 - prob(H,5)) # probability that 5 < H
See the Risk Calc page for a solution.
The following is code used for all the subproblems of Problem 2.
pk = 1
pn = 153
n = c(12, 0, 21, 14, 6)
showPNQ <- function(h) {
par(mfrow=c(2,2))
plot(p,xlim=c(0,1))
abline(v=pk/pn)
title('Probability of bad thing')
plot(N)
edf(n)
title('Number of trials')
plot(Q,xlim=c(0,1))
title(paste('Pr(Q<0.05) =',sayint(h)))
abline(v=0.05, col='green')
if (class(h)=='interval') for (i in c(h@lo,h@hi)) lines(c(-1,0.07),rep(i,2),col='blue') else lines(c(-1,0.07),rep(h,2),col='blue')
}
CBpoisson <- function(x) {n <- length(x); k = sum(x); return(env(negativebinomial(size=k, prob=1-1/(n+1)),negativebinomial(size=k+1, prob=1-1/(n+1))))}
p = CBbinomial.p(pk, pn)
N = CBpoisson(n)
Q = 1 - (1 - p) ^ N
showPNQ(prob(Q, 0.05))
p = env(CBbinomial.p(pk, pn),0)
N = CBpoisson(n)
Q = 1 - (1 - p) ^ N
showPNQ(prob(Q, 0.05))
sum.list <- function(x) {s=interval(0,0); for (i in 1:length(x))s=s+x[[i]]; return(s)}
CBpoisson <- function(x) {n <- length(x); k = sum.list(x); return(env(negativebinomial(size=k, prob=1-1/(n+1)),negativebinomial(size=k+1, prob=1-1/(n+1))))}
p = env(CBbinomial.p(pk, pn), 0)
n = c(
rep(interval(0,4), 6),
rep(interval(5,9), 4),
rep(interval(10,14), 8),
rep(interval(15,19), 3),
interval(20,24),
interval(25,29),
n)
N = CBpoisson(n)
Q = 1 - (1 - p) ^ N
n = mix.equal.interval(n)
showPNQ(prob(Q, 0.05))
The code below is not supported in pba.r.
Q = 1 - perfectconv.pbox(1-p, N, '^') # p decreases with N, so 1-p increases with N
showPNQ(prob(Q, 0.05))
See the Risk Calc page for a solution.