Script for examples
Least squares versus maximum likelihood
To recap quickly: estimates of effects, predictions, contrasts, and other values from linear models can be produced via least squares approaches, without making any distributional assumptions. These quantities will have desirable properties such as minimum variance and unbiasedness, but strictly speaking confidence intervals, p-values, AIC and other quantities depend on making distributional assumptions.
The default distributional assumption, which is used to generate output in lm() and similar R functions, is that the residual errors follow a normal distribution. However, this assumption is not necessary for maximum likelihood inference, if an alternative parametric model exists that allows construction of a likelihood function. By definition, estimates under such a model will have ML properties and will allow construction of confidence intervals, statistical tests, computation of AIC and other features of ML.
This distinction is important, because many of the data that arise in wildlife and fisheries statistics cannot reasonably be expected to have normal-like properties, which as a reminder include:
Some data types lack one or more of these features, but upon transformation become "normal-esque". For example, body masses typically have skewed distributions that obviously must be positive values; log transformation results in more symmetric distribution that includes (at least theoretically) negative values. However, many examples of data types cannot reasonably be expected to follow normality, and a more suitable statistical model should be sought, which more naturally reflects the nature of the data. Some examples:
Error distributions and link functions in glm
The glm() function in R provides additional flexibility by generalizing the linear model to allow non-normal errors and suitable scale transformations. The resulting very large family of models are now termed generalized linear models and require specification of 3 components:
Normal linear models are special case
error~Normal(0, sigma2)
Y=Y_obs (identity link)
Poisson regression
Poisson linear models are formed by
Poisson regression involves prediction of count-type data from (generally) continuous predictors. As simple example, we will use a time series of 40 years of insect counts (get data here) to test for a time trend. First read in the data and plot a histogram of the counts, which will confirm that the data has a characteristic skew.
> insects<-read.csv("insect_data.csv")
> hist(insects$count,xlab="Insect count")
I fit a linear model based on Poisson assumptions and a log transformation.
> mod1<-glm(count~year,family=poisson(link="log"),data=insects)
The model can be used to generate predicted values under the linear model via the predict function, which are then plotted against the data (after back-transforming by the inverse(log) = exp function:
> predict.log<-predict(mod1,se.fit=TRUE)
> with(insects,plot(year,count))
> with(insects,matlines(year,exp(predict.log$fit)))
> with(insects,matlines(year,exp(predict.log$fit-predict.log$se.fit*1.96),lty=2))
> with(insects,matlines(year,exp(predict.log$fit+predict.log$se.fit*1.96),lty=2))
Note that the predict function could have been used to directly generate predictions on the count scale, thus apparently avoiding the need for back-transformation. However, normal confidence intervals are more supported on the log scale, so I took the approach of computing these first from the log predicted fit and se, and then back -transforming.
Binomial regression
Binomial linear models are formed in a similar fashion. Here, the linear model is usually stated in terms of the binomial parameter p,
Y = logit(p)= log(p/(1-p)) , or equivalently
p = 1/(1+exp(-Y))
where
Y = b0 +b1X1+ ..... bkXk
The error distribution is then Bin(n,p)
So, this model essentially predicts that the success parameter depends on a logit-scale linear combination of covariate values, with the realized data (number of success) arising as usual a binomial conditional on the value of the success parameter (p) and the number of trials. A special case occurs when there is only a single trial (n=1) at each level or combination of the predictors, in which case the response x is 0 or 1. This is sometimes called binary regression, and is the usual situation when we have individual covariates in survival or other recapture models. I applied this approach to the binary version of the chick weight data in the attached script file, running the 5 models analogous to the regression/ ANOVA/ANCOVA models run previously. I likewise produced plots of predicted success rate versus observed under the simple regression model (mod2 in the set), and plots on the logit scale by diet, again showing a clear interaction (non-parallel lines) between Time and Diet.
Next: Nonlinear least squares