BUILDING AND EVALUATING MULTIPLE ALTERNATIVE MODELS IN R
General ideas
· Different models representing plausible alternative hypotheses
· Several models may have some degree of support from data
· Different models make different predictions
· How do we
o Decide which model(s) to use?
o Formulate predictions when we are uncertain about which model is true?
· Incorporate model uncertainty into our measures of model confidence?
To this point, when we fit a model to data, we have pretended that the model we are fitting is the “right” model. Then, our measures of model confidence (standard errors and confidence intervals of estimates and predictions) represent our “uncertainty” about parameter estimates, due to sampling error. But none of these measures take into account the fact that we are uncertain about the model itself.
As we have just seen, though, it is certainly possible for more than one model to be a plausible way of explaining the data. In addition, we can get least squares or maximum likelihood estimates of model coefficients, and they are perfectly good, assuming in each case that the model is true. In some cases, the fit of the model (R2, RSS, and other measures) is quite close for different models. But each model may be predicting something different.
AIC and Multi-model inference
As suggested earlier in class, one approach would be to try to find the ¨best¨ model and use its predictions alone, together with their standard errors, confidence intervals, etc. The approach would involve testing models to see if they fit the data, and then comparing more complex models to less complex ones by means of null hypothesis tests. However, as I´ve said this approach only makes sense when the null hypothesis is a real biological hypothesis; when there’s an experiment designed to test this null hypothesis; and when one model can always be derived as a special case of the other one. In ecology all 3 of the above conditions seldom apply, and often none of them do.
We instead advocate the approach developed by Akaike (1973) and described in detail in the excellent book by Burnham and Anderson (2002). The approach uses information theory to develop a statistic known as Akaike’s Information Criterion (AIC).
AIC is a measure of the information content of the data versus the model complexity, and has been proven to result in optimal model selection, especially when the goal is prediction outside the model data.
AIC is computed from the log-likelihood and the number of model parameters by the formula
where k is the number of parameters estimated in the model (e.g., k =2 for a simple regression: , and . Usually we will want to adjust AIC for sample size (especially for small samples) by the correction
;
the adjustment will be especially important as k gets large relative to n (as in the polynomial example). Returning to that example we can compute AICc for each model. The actual value of AIC is not important, rather is the value of a model’s AIC value relative to that of the model with the smallest AIC. We thus form the difference between the models AIC (or AICc)
where min(AIC) is the minimum AIC value over the set of models under consideration.
Finally, ΔAICi can be used to calculate an “information weight” for the model, representing how much the model should contribute to inference given the data.
.
where the denominator sums the term exp(-0.5ΔAICi) over the j =1 , …, m models under consideration.
. We will illustrate these calculations by way of several models that predict solar radiation as a function of elevation and/or southerly exposure. We might consider 9 models:
· A null model (no factors, so intercept only)
· Solar radiation predicted by elevation
· Solar radiation predicted by southern exposure
· Solar radiation predicted by southern exposure + elevation
· Solar radiation predicted by southern exposure + elevation+ southern exposure *elevation
· Solar radiation predicted by elevation in a quadratic model
· Solar radiation predicted by southern exposure in a quadratic model
· The 2 above models combined
· The above model with a southern exposure * elevation interaction
The attached R script runs these 9 models and extracts summary information. A couple things to note first:
· Because we are assuming Normal regression and an identity link function (no transformation), we can use lm( ) to get likelihood values, AIC, etc. We could just as easily have used glm( ); the commands to extract the information from the model objects would have been identical or nearly so.
· Some statistics like AIC and R2 are directly computed in lm() but I have nevertheless computed them from more basic ingredients to show you where these values come from.
The results are summarized in a table, which I have written to a csv for nicer formatting.
We can sort the data by ascending AIC (descending weight) to get a clearer picture:
A couple of things stand out:
· 5 models (the null, models with only exposure or elevation (but not together), and quadratic models of only exposure or elevation, have essentially no weight (i.e., data support) and probably do not need to be considered further
· Conversely, both elevation and exposure appear in each of the top 4 models
· One of these models is “the leader”, 2 are in the middle, and one is taking up the rear. But no one model is blowing the other 3 away
If our interest was in simply identifying the top model (or group of models) and associated factors (elevation and exposure in this case, we’d be done. But we are generally going to be interested in using our models for prediction. That is the topic of the next section.
Using AIC model averaging for prediction
Multi model inference and model averaging
In the above the emphasis was on comparing among models, with the idea of selecting a single model (or perhaps a set of models) on which to base inference. However, it is often the case that a parameter or prediction occurs in several model. If our primary focus is estimation or prediction, then we are going to care more about appropriate estimation of that parameter (or prediction) than about what model or models it is based on. Suppose that Ɵ is a parameter or a prediction that we are interested in estimating, and that estimates of Ɵ are available under several candidate models. Burnham and Anderson (2002) describe how we can combine the estimates over the alternative models using Akaike weights:
where
is the estimate or prediction under each model and wi are the Akaike weights (adjusted as appropriate for sample sizes). Inference on Ɵ also requires an estimate of estimator precision. Under each candidate model we have an estimated standard error (based on either the least-squares deviance or maximum likelihood). However, this estimate by definition assumes that the estimating model is the true model. To account for uncertainty in the parameter estimate due to model uncertainty, Burnham and Anderson (2002) advocate computation of an “unconditional standard error”.
In general, use of the unconditional standard error will produce larger estimated SEs and correspondingly wider confidence intervals, which is appropriate given that inference no longer rests on belief in a single model.
We will apply this approach to predictions from the alternative models of solar radiation. The attached code generates predictions under each model at elev =1000 and aspect_x =1 (maximum southern exposure) by means of a “new” data frame that is used in the predict() function in R. Obviously, one or the other of these do not affect predictions for some models, and neither does for the “null” model, which simply predicts the average response without regard to either factor. In order for this code to work, you have to have already created the model objects and computed AIC, etc.. These objects should still be in the memory but if they are not for some reason you need to run the earlier code to recomputed them and then proceed.
The 9 models provide these predictions with standard errors, now combined with the AIC results.
The model-averaged prediction an its unconditional standard error is computed using this information (see the script)
> model_avg_pred
[1] 1.354477
> uncond_se
[1] 0.009963737
Note that this is higher than the average across the models
> sum(se.pred*wt)
[1] 0.009954421
In general the unconditional se will be higher. This is the value that should be used in confidence intervals and other inferential procedures, since it more faithfully includes the fact that we are uncertain about the model structure.
Exercise: Model-averaged estimates of first bud- break (p1) using phenology data
Construct several alternative models to predict p1, the Julian date when bud-swelling was first observed. Consider combination of the following:
· Elevation
· Southern exposure
· Wetness index (topid)- see metadata file for explanation
· Construct 10 alternative models with combinations of these factors present or absent and if jointly present 2-way interactions. Including a null model. You will have plenty of combinations without quadratic terms, so just use linear terms.
· Compute R2 and other measures of fit, AICc, and model weight for each model
· Compute model-averaged predictions of p1 give the mean values in the data for elev, aspect_x, and topic