Read Section 9.1. There is wisdom here.
I will cover sections 9.2 and 9.3. The rest of Chapter 9 will be part of the coverage in the Data Mining / Bi Data course to be offered in the future.
PRESS
Press_Example.RData is the workspace (containing the data frame p) I use to illustrate the idea behind PRESS.
Press.R is the script I run.
Press.pdf is the handout (before any annotations made in class) distributed on PRESS.
Criteria
For this I use the bridge.txt data set. An R workspace containing bridge is below as bridge.RData (this also contains the data frame logbridge and the log variables).
The script I used is in KNN9.R.
Functions
The functions can be copy and pasted into your R session. There are ways to have functions load automatically when you start R.
I have a function delres that will compute the deleted residuals
l <- lm(Y~etc.)
Call the function as follows (the lm object is the input):
delres(l)
Or if you do this
d <- delres(l)
these residuals will be placed in the column / variable d.
Here is the function
delres <- function(l) {
e <- resid(l)
h <- hatvalues(l)
d <- e/(1 - h)
return(d)
}
My function PRESS can be used to compute the PRedicted Error Sum of Squares. Call the function as follows (the lm object is the input):
PRESS(l)
Here is the function.
PRESS<- function(l) {
e <- resid(l)
h <- hatvalues(l)
d <- e/(1 - h)
p <- sum(d^2)
return(p)
}
I have a function crit.
lm1 <- lm(Y~ etc.)
lm2 <- lm(Y~ etc.)
Only for Cp does it matter what lm2 is. If you want the correct Cp you need lm2 to include all potential predictors. To use the function:
crit(lm1, lm2)
Here's the function (it is also embedded in KNN9.R).
crit <- function(l,lf){
mse_full <- summary(lf)$sigma^2
nm <- formula(l)
n <- length(resid(l))
mse <- summary(l)$sigma^2
sse <- mse*l$df.residual
cp <- sse/mse_full - n + 2*length(l$coefficients)
r2 <- summary(l)$r.squared
adjr2 <- summary(l)$adj.r.squared
edi <- resid(l)/(1-hatvalues(l))
press <- sum(edi^2)
AIC <- extractAIC(l)[2]
SBC <- extractAIC(l,k=log(n))[2]
info <- c(sse,mse,r2,adjr2,press,cp,AIC,SBC)
res <- data.frame(info)
print(nm)
row.names(res) <- c("SSE","MSE","R2","R2-adj","PRESS","Cp","AIC","SBC")
return(res)
}