R Part 4b

The total length of the videos in this section is approximately 9 minutes, but you will also spend time running code while completing this section.

You can also view all the videos in this section at the YouTube playlist linked here.

Please download the code file:

Linear Regression

RPart4b.1.Regression.mp4

Question 1: What will you get if you run the following code?

plot(xvalues, yvalues)

abline(lm$coef)

A plot of your x and y values with a regression line drawn on top
A plot of expected x and y values with a line connecting all of your points drawn on top

Show answer

The first option. The function plot(xvalues, yvalues) will make a scatterplot of those values and abline(lm$coef) will draw a regression line on top of your plot.

Interactions and predictors

RPart4b.2.InteractionsAndPredictors.mp4

Question 2: Are the following statements equivalent?

lm(Age~Height+Weight+Height:Weight)
lm(Age~Height*Weight)

Show answer

Yes, they are equivalent.

Summary of ways to run a model (also appeared in R 4a)

Suppose that the data is called d, the outcome variable is Y, and the predictor variables are named X and Z.

The following lines show different ways to run a regression tree or linear regression.

ctree(d$Y~d$X) # works fine

lm(d$Y~d$X)

ctree(Y~X, data=d) # better, and necessary if using the predict function afterwards

lm(Y~X, data=d)

ctree(Y~X+Z, data=d) # multiple predictors

lm(Y~X+Z, data=d)

lm(Y~X+Z+X:Z, data=d) # interaction term (you don't ask for interactions when running a tree, because the idea of interactions is built into the tree algorithm)

lm(Y~X*Z, data=d) # short-cut for including main effects and also interaction

ctree(Y~., data=d) # short-cut for including all columns in d as predictors. Note that you can create a "d" that is a subset of the original data set before using this code, if helpful.

lm(Y~., data=d)

Question 3 (also appeared in R 4a): If you want to use 10 predictors, should you type out their names with plus signs in between?

Show answer

Nope. It is usually easier to write a line of code that creates a subset of the data with only these predictors and the outcome variable:

d.subset<-d[ , 5:15]

and then run the regression this way:

output<-lm(Y~., data=d.subset)

And, you're done.

During this tutorial you learned:

To run a regression model with the lm() function
About components of a regression model output
How to check regression assumptions by plotting the residuals against the fitted values
How to define a model with multiple predictors and/or interactions

Operators in review:

…$coef, …$resid, …$fitted.values

Functions in review:

lm(), confint(), abline(...$coef)