R Part 4b
The total length of the videos in this section is approximately 9 minutes, but you will also spend time running code while completing this section.
You can also view all the videos in this section at the YouTube playlist linked here.
Please download the code file:
Linear Regression
Question 1: What will you get if you run the following code?
plot(xvalues, yvalues)
abline(lm$coef)
A plot of your x and y values with a regression line drawn on top
A plot of expected x and y values with a line connecting all of your points drawn on top
Show answer
The first option. The function plot(xvalues, yvalues) will make a scatterplot of those values and abline(lm$coef) will draw a regression line on top of your plot.
Interactions and predictors
Question 2: Are the following statements equivalent?
lm(Age~Height+Weight+Height:Weight)
lm(Age~Height*Weight)
Show answer
Yes, they are equivalent.
Summary of ways to run a model (also appeared in R 4a)
Suppose that the data is called d, the outcome variable is Y, and the predictor variables are named X and Z.
The following lines show different ways to run a regression tree or linear regression.
ctree(d$Y~d$X) # works fine
lm(d$Y~d$X)
ctree(Y~X, data=d) # better, and necessary if using the predict function afterwards
lm(Y~X, data=d)
ctree(Y~X+Z, data=d) # multiple predictors
lm(Y~X+Z, data=d)
lm(Y~X+Z+X:Z, data=d) # interaction term (you don't ask for interactions when running a tree, because the idea of interactions is built into the tree algorithm)
lm(Y~X*Z, data=d) # short-cut for including main effects and also interaction
ctree(Y~., data=d) # short-cut for including all columns in d as predictors. Note that you can create a "d" that is a subset of the original data set before using this code, if helpful.
lm(Y~., data=d)
Question 3 (also appeared in R 4a): If you want to use 10 predictors, should you type out their names with plus signs in between?
Show answer
Nope. It is usually easier to write a line of code that creates a subset of the data with only these predictors and the outcome variable:
d.subset<-d[ , 5:15]
and then run the regression this way:
output<-lm(Y~., data=d.subset)
And, you're done.
During this tutorial you learned:
To run a regression model with the lm() function
About components of a regression model output
How to check regression assumptions by plotting the residuals against the fitted values
How to define a model with multiple predictors and/or interactions
Operators in review:
…$coef, …$resid, …$fitted.values
Functions in review:
lm(), confint(), abline(...$coef)