Chapter 09

Read Section 9.1. There is wisdom here.

I will cover sections 9.2 and 9.3. The rest of Chapter 9 will be part of the coverage in the Data Mining / Bi Data course to be offered in the future.

PRESS

Press_Example.RData is the workspace (containing the data frame p) I use to illustrate the idea behind PRESS.

Press.R is the script I run.

Press.pdf is the handout (before any annotations made in class) distributed on PRESS.

Criteria

For this I use the bridge.txt data set. An R workspace containing bridge is below as bridge.RData (this also contains the data frame logbridge and the log variables).

The script I used is in KNN9.R.

Functions

The functions can be copy and pasted into your R session. There are ways to have functions load automatically when you start R.

I have a function delres that will compute the deleted residuals

l <- lm(Y~etc.)

Call the function as follows (the lm object is the input):

delres(l)

Or if you do this

d <- delres(l)

these residuals will be placed in the column / variable d.

Here is the function

delres <- function(l) {

e <- resid(l)

h <- hatvalues(l)

d <- e/(1 - h)

return(d)

}

My function PRESS can be used to compute the PRedicted Error Sum of Squares. Call the function as follows (the lm object is the input):

PRESS(l)

Here is the function.

PRESS<- function(l) {

e <- resid(l)

h <- hatvalues(l)

d <- e/(1 - h)

p <- sum(d^2)

return(p)

}

I have a function crit.

lm1 <- lm(Y~ etc.)

lm2 <- lm(Y~ etc.)

Only for Cp does it matter what lm2 is. If you want the correct Cp you need lm2 to include all potential predictors. To use the function:

crit(lm1, lm2)

Here's the function (it is also embedded in KNN9.R).

crit <- function(l,lf){

mse_full <- summary(lf)$sigma^2

nm <- formula(l)

n <- length(resid(l))

mse <- summary(l)$sigma^2

sse <- mse*l$df.residual

cp <- sse/mse_full - n + 2*length(l$coefficients)

r2 <- summary(l)$r.squared

adjr2 <- summary(l)$adj.r.squared

edi <- resid(l)/(1-hatvalues(l))

press <- sum(edi^2)

AIC <- extractAIC(l)[2]

SBC <- extractAIC(l,k=log(n))[2]

info <- c(sse,mse,r2,adjr2,press,cp,AIC,SBC)

res <- data.frame(info)

print(nm)

row.names(res) <- c("SSE","MSE","R2","R2-adj","PRESS","Cp","AIC","SBC")

return(res)

}

Page updated

Report abuse