Min-ge Xie

MIn-ge Xie at Rutgers is very interested in c-boxes. Indeed, their definition of 'confidence distribution' is worded so that it actually includes both p-box-shaped and DSS-shaped structures. I think this is great.

However, he alerted me to an example that belies the claim that we can compute with c-boxes. The snippet of R code below implements his example (which he said was in his paper). If you comment out the red line, the simulation will suggest that the calculation to find the ratio of two normal means is fine, but if you leave the red line in, the simulation shows clearly that the resulting distribution doesn't have coverage for the true ratio.

mu1 = 5

mu2 = 10.01; xrange = c(-1,2)

#mu2 = 0.01; xrange = c(-500,1000)

sigma1 = 1

sigma2 = 1

many = 10000

# c-box and confidence distribution for the normal mean

cd = function(x) {n = length(x); mean(x) + sd(x) * rt(many,n-1) / sqrt(n)}

par(mfrow=c(4,5))

for (i in 1:20) {

x = rnorm(5, mu1, sigma1) # sample data

y = rnorm(6, mu2, sigma2) # sample data

plot(NULL, xlim=xrange, ylim=c(0,1))

edf(cd(x)/cd(y)) # the computed c-box for the ratio shown in green

lines(rep(mu1/mu2,2),c(0,1),col='red',lwd=3) # show the true ratio in red

}

Have I misunderstood the claim being made by the proof in your paper?

As an urgent practical matter, I think we need to articulate limits on what calculations are allowed that can project confidence intervals through calculations. Have we lied in the previous papers? Or can we say something simple, like..."so long as you don't divide or multiple by c-boxes that both straddle zero, you will be okay"?. That would be a very tolerable limitation that would still allow a lot of nice things.

############################################################################

# c-box and confidence distribution for the normal mean

cd = function(z) {n = length(z); mean(z) + sd(z) * rt(many,n-1) / sqrt(n)}

edf = function(x) lines(sort(x), 1:length(x)/length(x),type = 's')

f = function(a,b) a/b

n1 = 5

n2 = 6

mu1 = 5

mu2 = 10; xrange = c(-1,2)

mu2 = 0.01; xrange = c(-500,1000)

sigma1 = 1

sigma2 = 1

many = 1e4

some = 4000

par(mfrow=c(2,1))

u = array(0, some)

for (i in 1:some) {

x = rnorm(n1, mu1, sigma1) # sample data

y = rnorm(n2, mu2, sigma2) # sample data

d = f(cd(x), cd(y))

u[i] = sum(d < f(mu1,mu2)) / many

}

plot(c(0,1),c(0,1),type='l', col='red')

edf(u)

############################################################################

## Gosset was correct

#mu = 1

#sg = 1

#u = array( 0 , c(some) )

#for( i in 1:some ){

# x = rnorm( 2 , mu , sg )

# u[i] = sum( cd(x) < 1 )/many

#plot( c(0,1) , c(0,1) , type='l' , col=4 )

#edf(u)

Page updated

Google Sites

Report abuse