Compare the Confidence Intervals' Accuracy between OLS and Bootstrap

Initial setting:

X belongs to a normal distribution, mean at 0, and select some variance sigma^2. Error term belongs to standard normal distribution. The coefficient beta is (1 1.5 -1.5 1.5 0.5), including the constant term for beta0 = 1. And from beta1 to beta4 are coefficients for X, X^2, X^3, X^4.

Now we set n = 1000 as the original setting and observe the performance of the theoretical confidence interval and bootstrap percentile confidence interval of the predicted Y value. The formula of the theoretical confidence interval is given below.

WLOG: We calculate the 95% confidence interval by 2*Standard Error

I calculated the coverage probability of the theoretical 95% confidence interval and bootstrap confidence interval, used B=100 for bootstrap and simulated 100 times then took the mean of coverage probability. Given the two cases that the error term is normally distributed and uniformly distributed, and n = 1000 and n = 100

Theory: We can try to change the distribution of error term and sample size, because as we know the normal distribution of betas relies on either a normal error term or a big sample size. We either keep n = 1000, and change the error term to a uniform distribution, or keep a normal error term, and change the sample size n to 100.

x-axis is for different X values and y-axis is the average coverage probability

We can see that when our x0 is smaller than -0.5, the 2SE CI performs better, which means that they are closer to 95%, and if we change our error term setting, theoretical CI outperforms bootstrap CI.

This may be caused by the shape of our polynomial function, below is one of our simulation results, red points are f(x)_hat estimated by bootstrap, and blue lines are theoretical confidence intervals, we can see that in the range -1 to -0.5, theoretical CI is relatively narrow, which means at these range our function has less variation. Thus the 2SE formula works well.

However, when x0 is bigger than -0.5, bootstrap CI performs better, especially when we change to a non-normal error term, see the dashed lines. This may be caused by a relatively higher variation of our function (which means the variance of beta coefficients are high, leading to a much wider 2SE CI), and in such case our bootstrap CI is more stable than 2SE CI. Thus even though n = 1000 in such a case, our estimated regression coefficients are not asymptotically normally distributed, thus 2SE formula does not work.

If we keep the error term normally distributed, and we change our sample size n, we could get similar results as shown below:

x-axis is for different X values and y-axis is the average coverage probability

x-axis is for different X values and y-axis is the deviation of 95% and average coverage probability (0.95 - mean.c.p.)

If we show 95% - Coverage.Prob, we can get the same results. It is just another type of illustration.

This figure shows the deviation of confidence interval length between the theoretical confidence interval and bootstrap confidence interval given different values of X. Positive y-axis shows that in each case theoretical CI is longer than bootstrap CI, under a 95% significant level. And for the same level CI, we prefer the shorter, thus, Bootstrap CI performs better, especially in non-normal error term samples compared to theoretical CI.

This figure shows the deviation of confidence interval length between the theoretical confidence interval and bootstrap confidence interval given different values of X. Positive y-axis shows that in each case theoretical CI is longer than bootstrap CI, under a 95% significant level. And for the same level CI, we prefer the shorter, thus, Bootstrap CI performs better, especially in smaller sample sizes compared to theoretical CI.

Above is between groups, now we inspect what happens within groups:

This figure is still about interval length, the only change is y-axis as you can see.

We find that in the smaller n case, both CIs become longer, but relatively theoretical CI becomes longer than Bootstrap's, which also shows in a smaller sample size, the bootstrap confidence interval is more precise.

We also find that in the non-normal error case, both CIs become longer, but relatively theoretical CI becomes longer than Bootstrap's, which also shows in a non-normal error case, the Bootstrap confidence interval is more precise.

Our experiment shows that when our target function varies not too big, although violates some assumptions about our distribution of regression coefficients, the 2SE confidence interval still works well. However, when our interested function has high variation, such as high polynomial functions at some interval, bootstrap CI gives us a more robust estimated CI than theoretical 2SE CI, this feature is determined by the essence of bootstrap.

In addition, the normal assumed CI highly depends on some assumptions of beta distribution and sample size in linear regression. However, for Bootstrap, it provides somehow more stable results in terms of CI.

Finally, we can summarize that, when we are not sure about the distribution of the estimated coefficient, or our estimated coefficients' distribution is much more complex, or our estimated function has a high variance, we prefer the more stable bootstrap CI.

R Code (click to watch)

#install.packages("boot")

library(boot)

rm(list=ls())

##############################################

### 1.a

n <- 1000

beta <- matrix(c(1,1.5,-1.5,1.5,0.5),c(5,1))

sigma <- 1.2

data.gen <- function(beta,sigma){

e <- rnorm(n,0,1)

X <- rnorm(n,0,sigma)

constant <- rep(1,n)

X <- cbind(constant,X,X^2,X^3,X^4)

Y <- X%*%beta+e

data <- data.frame(Y,X)

}

sim <- data.gen(beta,1.2)

library(sandwich)

boot.fn <- function(data,index){

coef(lm(Y~.,data=sim,subset = index))

}

boot.result <- boot(sim,boot.fn,R=100)

boot.result <- boot.result$t

boot.result <- boot.result[,-2]

boot.result <- as.matrix(boot.result)

CI_data.boot <- matrix(NA,length(x0),100)

# rows for each x0; columns for 100 times resample

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

CI_data.boot[i,] <- t(boot.result%*%l)

}

## sort CI data

for (i in 1:length(x0)) {

CI_data.boot[i,] <- sort(CI_data.boot[i,])

}

## find corresponding critical value in bootstrap data

# according to theoretical CI

##################################################

## now find naive CI in bootstrap data

CI_bootstrap <- matrix(NA,length(x0),2)

for (i in 1:length(x0)) {

q <- unname(quantile(CI_data.boot[i,],

c(0.025,0.975)))

CI_bootstrap[i,1] <- q[1]

CI_bootstrap[i,2] <- q[2]

}

CI_bootstrap

plot(x0,CI_bootstrap[,1],col="red","l")

lines(x0,CI_bootstrap[,2],col="red")

lines(x0,CI[,1],col="blue")

lines(x0,CI[,2],col="blue")

plot(rep(x0[1],100),CI_data.boot[1,],col="red",

ylim = c(-3,3),xlim = c(-1,1),

main = "Bootstrap result & 2SE CI",pch=19)

for (i in 2:length(x0)) {

points(rep(x0[i],100),CI_data.boot[i,],col="red")

}

lines(x0,CI[,1],col="blue")

lines(x0,CI[,2],col="blue")

set.seed(527)

n <- 1000

beta <- matrix(c(1,1.5,-1.5,1.5,0.5),c(5,1))

sigma <- 1.2

x0 <- seq(-1,1,0.1)

S <- 100 # simulation times

## for simplicity, we create some functions

# compute theoretical CI with 2SE

theoretical.CI_fun <- function(sim){

f_hat <- lm(Y~.-1,data=sim)

C_hat <- vcovHC(f_hat)

CI <- matrix(NA,length(x0),2)

Y_hat <- c()

coe <- as.matrix(coef(f_hat),c(5,1))

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

Y_hat[i] <- t(coe)%*%l

var_f <- t(l)%*%C_hat%*%l

se <- sqrt(var_f)

CI[i,] <- c(Y_hat[i]-2*se,Y_hat[i]+2*se)

}

return(CI)

}

# compute bootstrap data

bootstrap.data_fun <- function(sim){

boot.result <- boot(sim,boot.fn,R=100)

boot.result <- boot.result$t

boot.result <- boot.result[,-2]

boot.result <- as.matrix(boot.result)

CI_data.boot <- matrix(NA,length(x0),100)

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

CI_data.boot[i,] <- t(boot.result%*%l)

}

return(CI_data.boot)

}

## compute bootstrap CI at 95%

CI_bootstrap_fun <- function(CI_data.boot){

CI_bootstrap <- matrix(NA,length(x0),2)

for (i in 1:length(x0)) {

q <- unname(quantile(CI_data.boot[i,],

c(0.025,0.975)))

CI_bootstrap[i,1] <- q[1]

CI_bootstrap[i,2] <- q[2]

}

return(CI_bootstrap)

}

### all we need finally are stored in here

final_result <- matrix(NA,length(x0),8)

colnames(final_result) <- c("theo.CI.low","theo.CI.up",

"boot.CI.low","boot.CI.up","coverage.Prob.SE",

"coverage.Prob.boot","length.theo.CI","length.boot.CI")

frc1 <- matrix(NA,length(x0),S)

frc2 <- matrix(NA,length(x0),S)

frc3 <- matrix(NA,length(x0),S)

frc4 <- matrix(NA,length(x0),S)

frc7 <- matrix(NA,length(x0),S)

frc8 <- matrix(NA,length(x0),S)

for (i in 1:S) {

sim <- data.gen(beta,1.2)

CI <- theoretical.CI_fun(sim)

frc1[,i] <- CI[,1]

frc2[,i] <- CI[,2]

CI_data.boot <- bootstrap.data_fun(sim)

CI_bootstrap <- CI_bootstrap_fun(CI_data.boot)

frc3[,i] <- CI_bootstrap[,1]

frc4[,i] <- CI_bootstrap[,2]

frc7[,i] <- CI[,2]-CI[,1]

frc8[,i] <- CI_bootstrap[,2]-CI_bootstrap[,1]

}

# stupid value given procedure

for (i in 1:length(x0)) {

final_result[i,1] <- mean(frc1[i,])

}

for (i in 1:length(x0)) {

final_result[i,2] <- mean(frc2[i,])

}

for (i in 1:length(x0)) {

final_result[i,3] <- mean(frc3[i,])

}

for (i in 1:length(x0)) {

final_result[i,4] <- mean(frc4[i,])

}

for (i in 1:length(x0)) {

final_result[i,7] <- mean(frc7[i,])

}

for (i in 1:length(x0)) {

final_result[i,8] <- mean(frc8[i,])

}

y_true <- c()

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

y_true[i] <- t(beta)%*%l

}

c.p <- c()

c.p.boot <- c()

judge <- matrix(NA,length(x0),100)

judge2 <- matrix(NA,length(x0),100)

for (j in 1:length(x0)) {

judge[j,] <- (y_true[j]>= frc1[j,]&

y_true[j]<= frc2[j,])

judge2[j,] <- (y_true[j]>= frc3[j,]&

y_true[j]<= frc4[j,])

}

for (i in 1:length(x0)) {

c.p[i] <- mean(judge[i,])

c.p.boot[i] <- mean(judge2[i,])

}

final_result[,5] <- c.p

final_result[,6] <- c.p.boot

final_result

# final result shows that 2SE over estimate 95%CI

# change 2 to 1.96 will improve, but doesn't matter

# now let's change dgp to improve bootstrap

##

### what is a good CI??

# if we say good CI is which closer to real 1-alpha

# then less n

# non-normal error term

# and if we think the shorter CI length, the better

# both above should hold

# change to hetroscho....error

#n <- 1000

n <- 100

beta <- matrix(c(1,1.5,-1.5,1.5,0.5),c(5,1))

sigma <- 1.2

x0 <- seq(-1,1,0.1)

S <- 100 # simulation times

set.seed(527)

data.gen.v2 <- function(beta,sigma){

#e <- runif(n,-sqrt(5),sqrt(5))

e <- rnorm(n,0,1)

X <- rnorm(n,0,sigma)

constant <- rep(1,n)

X <- cbind(constant,X,X^2,X^3,X^4)

Y <- X%*%beta+e

data <- data.frame(Y,X)

}

final_result1 <- matrix(NA,length(x0),8)

colnames(final_result1) <- c("theo.CI.low","theo.CI.up",

"boot.CI.low","boot.CI.up","coverage.Prob.SE",

"coverage.Prob.boot","length.theo.CI","length.boot.CI")

frc1 <- matrix(NA,length(x0),S)

frc2 <- matrix(NA,length(x0),S)

frc3 <- matrix(NA,length(x0),S)

frc4 <- matrix(NA,length(x0),S)

frc7 <- matrix(NA,length(x0),S)

frc8 <- matrix(NA,length(x0),S)

## this loop may have warning message, ignore it

for (i in 1:S) {

sim <- data.gen.v2(beta,1.2)

CI <- theoretical.CI_fun(sim)

frc1[,i] <- CI[,1]

frc2[,i] <- CI[,2]

CI_data.boot <- bootstrap.data_fun(sim)

CI_bootstrap <- CI_bootstrap_fun(CI_data.boot)

frc3[,i] <- CI_bootstrap[,1]

frc4[,i] <- CI_bootstrap[,2]

frc7[,i] <- CI[,2]-CI[,1]

frc8[,i] <- CI_bootstrap[,2]-CI_bootstrap[,1]

}

for (i in 1:length(x0)) {

final_result1[i,1] <- mean(frc1[i,])

}

for (i in 1:length(x0)) {

final_result1[i,2] <- mean(frc2[i,])

}

for (i in 1:length(x0)) {

final_result1[i,3] <- mean(frc3[i,])

}

for (i in 1:length(x0)) {

final_result1[i,4] <- mean(frc4[i,])

}

for (i in 1:length(x0)) {

final_result1[i,7] <- mean(frc7[i,])

}

for (i in 1:length(x0)) {

final_result1[i,8] <- mean(frc8[i,])

}

y_true <- c()

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

y_true[i] <- t(beta)%*%l

}

c.p <- c()

c.p.boot <- c()

judge <- matrix(NA,length(x0),100)

judge2 <- matrix(NA,length(x0),100)

for (j in 1:length(x0)) {

judge[j,] <- (y_true[j]>= frc1[j,]&

y_true[j]<= frc2[j,])

judge2[j,] <- (y_true[j]>= frc3[j,]&

y_true[j]<= frc4[j,])

}

for (i in 1:length(x0)) {

c.p[i] <- mean(judge[i,])

c.p.boot[i] <- mean(judge2[i,])

}

final_result1[,5] <- c.p

final_result1[,6] <- c.p.boot

################################################

# this is for small sample case

final_result2 <- matrix(NA,length(x0),8)

colnames(final_result2) <- c("theo.CI.low","theo.CI.up",

"boot.CI.low","boot.CI.up","coverage.Prob.SE",

"coverage.Prob.boot","length.theo.CI","length.boot.CI")

frc1 <- matrix(NA,length(x0),S)

frc2 <- matrix(NA,length(x0),S)

frc3 <- matrix(NA,length(x0),S)

frc4 <- matrix(NA,length(x0),S)

frc7 <- matrix(NA,length(x0),S)

frc8 <- matrix(NA,length(x0),S)

## this loop may have warning message, ignore it

for (i in 1:S) {

sim <- data.gen.v2(beta,1.2)

CI <- theoretical.CI_fun(sim)

frc1[,i] <- CI[,1]

frc2[,i] <- CI[,2]

CI_data.boot <- bootstrap.data_fun(sim)

CI_bootstrap <- CI_bootstrap_fun(CI_data.boot)

frc3[,i] <- CI_bootstrap[,1]

frc4[,i] <- CI_bootstrap[,2]

frc7[,i] <- CI[,2]-CI[,1]

frc8[,i] <- CI_bootstrap[,2]-CI_bootstrap[,1]

}

for (i in 1:length(x0)) {

final_result2[i,1] <- mean(frc1[i,])

}

for (i in 1:length(x0)) {

final_result2[i,2] <- mean(frc2[i,])

}

for (i in 1:length(x0)) {

final_result2[i,3] <- mean(frc3[i,])

}

for (i in 1:length(x0)) {

final_result2[i,4] <- mean(frc4[i,])

}

for (i in 1:length(x0)) {

final_result2[i,7] <- mean(frc7[i,])

}

for (i in 1:length(x0)) {

final_result2[i,8] <- mean(frc8[i,])

}

y_true <- c()

for (i in 1:length(x0)) {

l <- matrix(c(1,x0[i],x0[i]^2,x0[i]^3,x0[i]^4),

c(5,1))

y_true[i] <- t(beta)%*%l

}

c.p <- c()

c.p.boot <- c()

judge <- matrix(NA,length(x0),100)

judge2 <- matrix(NA,length(x0),100)

for (j in 1:length(x0)) {

judge[j,] <- (y_true[j]>= frc1[j,]&

y_true[j]<= frc2[j,])

judge2[j,] <- (y_true[j]>= frc3[j,]&

y_true[j]<= frc4[j,])

}

for (i in 1:length(x0)) {

c.p[i] <- mean(judge[i,])

c.p.boot[i] <- mean(judge2[i,])

}

final_result2[,5] <- c.p

final_result2[,6] <- c.p.boot

#View(final_result1)

# the error term changed case

plot(x0,final_result[,5],"l",lwd=2,

ylim=c(0.9,1),main = "Coverage Probability",

ylab = "Coverage Probability")

lines(x0,final_result1[,5],lty=2,lwd=2)

lines(x0,final_result[,6],lty=1,col='red',lwd=2)

lines(x0,final_result1[,6],lty=2,col='red',lwd=2)

legend("bottomright",lty = c(1,2,1,2),lwd = c(2,2,2,2),

col=c("black","black",'red','red'),

legend = c("Normal.2SE","Uniform.2SE",

"Normal.boot","Uniform.boot"))

lines(x0,rep(0.95,length(x0)),col='blue',lwd=2)

# the small sample case

plot(x0,final_result[,5],"l",lwd=2,

ylim=c(0.9,1),main = "Coverage Probability",

ylab = "Coverage Probability")

lines(x0,final_result2[,5],lty=2,lwd=2)

lines(x0,final_result[,6],lty=1,col='red',lwd=2)

lines(x0,final_result2[,6],lty=2,col='red',lwd=2)

legend("bottomright",lty = c(1,2,1,2),lwd = c(2,2,2,2),

col=c("black","black",'red','red'),

legend = c("1000.2SE","100.2SE",

"1000.boot","100.boot"))

lines(x0,rep(0.95,length(x0)),col='blue',lwd=2)

# plot the difference of coverage prob.

plot(x0,rep(0.95,length(x0))-final_result[,5],"l",lwd=2,

ylim=c(-0.05,0.05),ylab = "95% - Coverage.Prob.",

main = "Deviation of Coverage Prob. from 95%")

lines(x0,rep(0.95,length(x0))-final_result2[,5],lty=2,lwd=2)

lines(x0,rep(0.95,length(x0))-final_result[,6],lty=1,col='red',lwd=2)

lines(x0,rep(0.95,length(x0))-final_result2[,6],lty=2,col='red',lwd=2)

legend("topright",lty = c(1,2,1,2),lwd = c(2,2,2,2),

col=c("black","black",'red','red'),

legend = c("1000.2SE","100.2SE",

"1000.boot","100.boot"))

lines(x0,rep(0,length(x0)),col='blue',lwd=2)

# plot the deviation of interval length

# within groups

plot(x0,final_result[,7]-final_result1[,7],"l",

ylim=c(-0.056,-0.045),

main = "Advantage of Bootstrap CI over 2SE CI",

ylab = "(Normal case) - (Uniform case)",lwd=2)

lines(x0,final_result[,8]-final_result1[,8],lty=2,lwd=2)

legend("topright",lty = 1:2,

lwd = c(2,2),legend = c("Theoretical",

"Bootstrap"))

plot(x0,final_result[,7]-final_result2[,7],"l",

ylim=c(-0.52,-0.35),

main = "Advantage of Bootstrap CI over 2SE CI",

ylab = "(n=1000) - (n=100)",lwd=2)

lines(x0,final_result[,8]-final_result2[,8],lty=2,lwd=2)

legend("topright",lwd = c(2,2),lty = 1:2,legend = c("Theoretical",

"Bootstrap"))

# between groups

plot(x0,final_result1[,7]-final_result1[,8],

ylim = c(0.008,0.016),lwd=2,

ylab = "2SE CI - Bootstrap CI",

main = "Advantage of Bootstrap CI over 2SE CI","l")

# ylim for error setting

lines(x0,final_result[,7]-final_result[,8],lty=2,lwd=2)

legend("bottomleft",lty = 1:2,lwd = c(2,2),

legend = c("Uniform Error","Normal Error"))

plot(x0,final_result2[,7]-final_result2[,8],lwd=2,

ylim = c(0.01,0.07),ylab = "2SE CI - Bootstrap CI",

main = "Advantage of Bootstrap CI over 2SE CI","l")

# ylim for small n setting

lines(x0,final_result[,7]-final_result[,8],lty=2,lwd=2)

legend("topleft",lwd = c(2,2),

lty = 1:2,legend = c("n = 100","n = 1000"))