7.4. Nonlinear least squares

Up til now, all the models we fit with data were of a form that is "linear in the parameters", meaning that the model can be expressed by left side (response) equal to the sum of coefficients (b₀, b₁, etc.) and predictor (independent variables) X₁ ---- X_k.

Linear models actually provide a great deal of flexibility, in that many relationships that don't look linear can be made so by transformations or re-parameterizations. Linear models also often some statistical advantages, for instance, the least squares estimates become maximum likelihood if we can assume an error distribution (such as normal).

Obviously, a relationship like

Y = b₀ + b₁X₁ + b₂X₂

is linear, but we can also use linear models to represent interactions and polynomial terms, e.g.,

Y = b₀+ b₁X₁ +b₂X₂ + b₃X₁*X₂ +b₄X₂²

is also linear. Similarly, we can sometimes use transformations to linearize what looks like a nonlinear model, e.g.,

Y = exp(a)*X^b

can be linearized by

log(Y)= a+b*log(X).

However, some problems cannot be linearized in this this way. For example, the logistic growth function that stipulates that growth follows a sigmoid curve with a fixed upper limit (e.g., population carrying capacity, tree maximum height). To estimate the parameters of this function with data, we must use a procedure known as nonlinear least squares (nls). NLS works similarly to least squares, in that the solutions are obtained by minimizing the squared differences (residuals) between model predictions and data, with the difference being that now the model is nonlinear in the parameters.

We'll take as our example a built-in database for loblolly pine (Pinus taeda) growth, in which we focus on the relationship between height and age. The form of the logistic model we use expresses this relationship as

height~L/(1+exp(-k*(age-age0))

where the response is height, the predictor is age, and the parameters are L (maximum of height), k (maximum height increment) and age0 (age at which growth begins to slow, also known as the inflection point).

The attached script file gets the data set, selects off a subset of the data (a specific seed source), fits the above nonlinear model to the data (height and age). Note that starting values for L, k and age0 need to be specified. Generally, reasonable guesses will work but in some cases the procedure may be very sensitive to starting values, which if too off result in lack of convergence. I guessed maximum height at about 65 (feet), maximum growth at 0.1, and inflection age at 20. The least square estimates are not too far off: 57, 0.23, and 11.7. The plot of the prediction versus the data reveals decent fit.

By the way, in population ecology, the logistic equation is usually expressed in its differential form

dN/dt = rN(1-N/K)

where r is maximum growth and K is carrying capacity. The solution to this differential equation is the nonlinear equation

N_t = KN₀exp(rt) /[K+N₀(exp(rt)-1)]

The parameters of this model would be initial population size (N₀), r and K, and the model would be fit to the relationship between abundance (N_t) and time (t).

Next: Exercises

Page updated

Google Sites

Report abuse