Post date: Jun 14, 2015 5:54:24 PM
The current architecture of our free c-box software does not account for the dependency information between different estimated parameters for a single data set. For instance, the mean and standard deviation of a normal distribution are estimated by
parameter.normal.mu <- function(x){n=length(x); return(mean(x)+sd(x)*student(n-1)/sqrt(n))}
parameter.normal.sigma <- function(x){n=length(x); return(sqrt(var(x)*(n-1)/chisquared(n-1)))}
whereas an alternative implementation would recognize their dependence such as with the linked expressions
σ = sqrt(var(x)*(n-1)/chisquared(n-1))
μ = normal(mean(x), σ/sqrt(n))
in which the standard deviation σ computed in the first line is used in the formula of the second line to compute the mean μ. In the three figures below, the top graph shows the dependence between the parameters effectively realized by the free software, and the middle graph shows the dependence they would naturally have. The bottom graph superimposes both patterns.
There is a similar but more dramatic story for the case of the uniform distribution. In the free software, the midpoint and width of a uniform distribution are estimated by
parameter.uniform.width <- function(x) {return((max(x)-min(x))/rbeta(length(x)-1,2))}
parameter.uniform.midpoint <- function(x) {wide=max(x)-min(x); w=wide/rbeta(length(x)-1,2); return((max(x)-w/2)+(w-wide)*runif(many))}
A better implementation would recognize their dependence, such as in
w = (max(x)-min(x)) / rbeta(many, n-1,2)
m = (max(x)-w/2) + (w - (max(x)-min(x))) * runif(many)
where the w computed in the first line is used to compute m in the second line. In the three figures below, the top graph shows the dependence between the parameters effectively realized by the free software, the middle graph shows the dependence they would naturally have, and the bottom graph superimposes the two patterns for comparison.
In both of the normal and uniform cases, the marginal c-boxes will look the same whether these dependencies are included or not, but the nextvalue distributions will in principle be affected.
Here is the code used to make the figures.
x = runif(8,0,1)
many = 500
width = (max(x)-min(x))/rbeta(many,length(x)-1,2)
w=(max(x)-min(x))/rbeta(many,length(x)-1,2)
midpoint = (max(x)-w/2)+(w-(max(x)-min(x)))*runif(many)
w = (max(x)-min(x)) / rbeta(many, length(x)-1,2)
m = (max(x)-w/2) + (w - (max(x)-min(x))) * runif(many)
par(mfrow=c(3,1))
plot(width,midpoint)
plot(w,m,col='red')
plot(width,midpoint)
points(w,m,col='red')
cor(width,midpoint)
cor(w,m)
many=1000
x = rnorm(8,10,1)
n=length(x)
mu = mean(x)+sd(x)*rt(many,n-1)/sqrt(n)
sigma = sqrt(var(x)*(n-1)/rchisq(many,n-1))
s = sqrt(var(x)*(n-1)/rchisq(many,n-1))
m = rnorm(many,mean(x), s/sqrt(n))
par(mfrow=c(3,1))
plot(mu,sigma)
plot(m,s,col='red')
plot(mu,sigma)
points(m,s,col='red')
cor(mu,sigma)
cor(m,s)