7.3. Generalized linear models

Script for examples

Least squares versus maximum likelihood

To recap quickly: estimates of effects, predictions, contrasts, and other values from linear models can be produced via least squares approaches, without making any distributional assumptions. These quantities will have desirable properties such as minimum variance and unbiasedness, but strictly speaking confidence intervals, p-values, AIC and other quantities depend on making distributional assumptions.

The default distributional assumption, which is used to generate output in lm() and similar R functions, is that the residual errors follow a normal distribution. However, this assumption is not necessary for maximum likelihood inference, if an alternative parametric model exists that allows construction of a likelihood function. By definition, estimates under such a model will have ML properties and will allow construction of confidence intervals, statistical tests, computation of AIC and other features of ML.

This distinction is important, because many of the data that arise in wildlife and fisheries statistics cannot reasonably be expected to have normal-like properties, which as a reminder include:

continuously distributed on the real number scale (so -Inf to Inf)
following a typical bell-shaped,symmetric frequency distribution determined by location (mean) and scaling (variance) parameters.

Some data types lack one or more of these features, but upon transformation become "normal-esque". For example, body masses typically have skewed distributions that obviously must be positive values; log transformation results in more symmetric distribution that includes (at least theoretically) negative values. However, many examples of data types cannot reasonably be expected to follow normality, and a more suitable statistical model should be sought, which more naturally reflects the nature of the data. Some examples:

Counts of animals, plants or other objects in sample units. These take non-negative integer values and are more appropriately modeled by a discrete distribution such as the Poisson.
Success/failure outcomes (survival, recapture, etc.). These take on values of either 0 to 1 (individual Bernoulli outcomes) or 0 to k (number Binomial outcomes in k trials). These are more appropriately modeled by the binomial (or when the outcomes are multitomous, multinomial) distribution.

Error distributions and link functions in glm

The glm() function in R provides additional flexibility by generalizing the linear model to allow non-normal errors and suitable scale transformations. The resulting very large family of models are now termed generalized linear models and require specification of 3 components:

The linear model itself, of the form Y = b0 +b1X1+ ..... bkXk +error
A distribution for the error which is generally assumed to have a mean of zero
A link function between the original measurement Y_obs and Y

Normal linear models are special case

error~Normal(0, sigma2)

Y=Y_obs (identity link)

Poisson regression

Poisson linear models are formed by

error~Poisson(lambda)
Y=log(Y_obs) (log link)

Poisson regression involves prediction of count-type data from (generally) continuous predictors. As simple example, we will use a time series of 40 years of insect counts (get data here) to test for a time trend. First read in the data and plot a histogram of the counts, which will confirm that the data has a characteristic skew.

> insects<-read.csv("insect_data.csv")

> hist(insects$count,xlab="Insect count")

I fit a linear model based on Poisson assumptions and a log transformation.

> mod1<-glm(count~year,family=poisson(link="log"),data=insects)

The model can be used to generate predicted values under the linear model via the predict function, which are then plotted against the data (after back-transforming by the inverse(log) = exp function:

> predict.log<-predict(mod1,se.fit=TRUE)

> with(insects,plot(year,count))

> with(insects,matlines(year,exp(predict.log$fit)))

> with(insects,matlines(year,exp(predict.log$fit-predict.log$se.fit*1.96),lty=2))

> with(insects,matlines(year,exp(predict.log$fit+predict.log$se.fit*1.96),lty=2))

Note that the predict function could have been used to directly generate predictions on the count scale, thus apparently avoiding the need for back-transformation. However, normal confidence intervals are more supported on the log scale, so I took the approach of computing these first from the log predicted fit and se, and then back -transforming.

Binomial regression

Binomial linear models are formed in a similar fashion. Here, the linear model is usually stated in terms of the binomial parameter p,

Y = logit(p)= log(p/(1-p)) , or equivalently

p = 1/(1+exp(-Y))

where

Y = b0 +b1X1+ ..... bkXk

The error distribution is then Bin(n,p)

So, this model essentially predicts that the success parameter depends on a logit-scale linear combination of covariate values, with the realized data (number of success) arising as usual a binomial conditional on the value of the success parameter (p) and the number of trials. A special case occurs when there is only a single trial (n=1) at each level or combination of the predictors, in which case the response x is 0 or 1. This is sometimes called binary regression, and is the usual situation when we have individual covariates in survival or other recapture models. I applied this approach to the binary version of the chick weight data in the attached script file, running the 5 models analogous to the regression/ ANOVA/ANCOVA models run previously. I likewise produced plots of predicted success rate versus observed under the simple regression model (mod2 in the set), and plots on the logit scale by diet, again showing a clear interaction (non-parallel lines) between Time and Diet.

Next: Nonlinear least squares

Page updated

Google Sites

Report abuse