Script for the examples below
Linear models include many familiar tests and analyses as special cases. The main ones:
In a bit more detail:
Linear regression
Linear regression involves a model of the form
Y^ = b0 + b1X1 + ..... bkXk
where Y^ is the prediction for a continuous response Y, and X1.... Xk are predictors or independent variables (by observation or design) and assumed to be fixed/ known. The standard method of fitting a regression model is least squares estimation, which simply involves finding values for the model parameters b0...bk that minimize the sum squared differences of observed(Y) and predicted (Y^) values. In the special case where the residual errors (Yi-Y^i) are normally distributed, the least squares estimates are also the maximum likelihood estimates, with attended validity of confidence intervals, hypotheses tests, AIC comparison, etc. This case is sometimes referred to as normal regression, to distinguish from cases that we will consider later involving other error distribution.
Analysis of variance (ANOVA)
Under ANOVA the "predictor variable" involves discrete groupings (factor levels). However, we still have a linear model, and computations are performed by methods that are very akin to those used in regression (again, minimizing error sum of squares). As with regression, if we invoke additional normality assumptions, distribution-based tests such as the F-test now provide inference about hypotheses of interest and confidence intervals of estimated treatment effects, contrasts, etc.
Analysis of covariance (ANCOVA)
Analysis of covariance is a sort of blend of linear regression and ANOVA in which we are interested in testing and estimating group effects (as in ANOVA) but taking into account the linear effect of one or more covariates. The close connection between regression, ANOVA, and ANCOVA is emphasized by the fact the R uses the same function -- lm() for all 3. This is aptly illustrated by the chick data example.
>data(ChickWeight)
>chicks<-ChickWeight
First, the ANOVA of weight by diet treatment is constructed in lm() and summarized, with an ANOVA table
>#ANOVA
>model1<-lm(weight~Diet,data=chicks)
>summary(model1)
>aov(model1)
The regression of weight on time is likewise built in lm(). It too has an ANOVA table ( 1 df for the slope of the regression)
>##REGRESSION
>model2<-lm(weight~Time,data=chicks)
>summary(model2)
>aov(model2)
ANCOVA puts these together. Note there are 2 possible models: additive Time and Diet effects (same slope of weight vs Time across all the diets) or interactive effects (slopes differ among diets).
>#ANCOVA
>model3<-lm(weight~Time+Diet,data=chicks)
>summary(model3)
>aov(model3)
>model4<-lm(weight~Time*Diet,data=chicks)
>summary(model4)
>aov(model4)
The accompanying script file adds 2 features that often will be useful.
Note: The lm() and related procedures in R automatically and without warning produce p values, confidence intervals and AIC statistics. Again, these are based on MLE and normality assumptions. The least-squares estimates are unbiased without these assumptions, but understand that if you report p values and CIs or use AIC you are implicitly assuming normality, unless you make a different distributional assumption, which we're not allowed to do in the lm() function, but can in the glm() function, coming up next.
Next: Assignment