Definitions of Linear

There seem to be different ways of interpreting what is meant by "linear" in linear regression. One is more common, and one is more correct in terms of the statistical theory behind the method.

Interpretation 1 (most common):

- "Linear" means "linear in the predictor variable", where the form of the explanatory variable has an exponent equal to 1.

E(Y) = B₀ + B₁*x¹ where E(Y) is the expected value of Y

Interpretation 2 (most statistically relevant):

"Linear" in this sense means that the expected value of the response is a linear function of the parameters, or that it is "linear in the betas".

Intrinsically linear regression models include:

polynomial models, and
models that are nonlinear in the parameters, but that can transformed into a model that is linear in the parameters.

A model that is linear in the parameters and is also linear in the predictor variable is called a first-order model.

References:

Design and Analysis of Experiments (Montgomery) section 3.2, page 64
Applied Linear Regression Models (Kutner) section 1.3, page 9
Mathematical Statistics with Applications (Wackerly)
(See StatSoft for the distinction between linear and non-linear models, along with examples.)
(See NIST definition of "linear" along with examples.)

Examples of models that are "linear in the betas" include:

E(Y) = e^{(B0 + B1*x)} which can be transformed to ln(y) = B₀ + B₁*x

E(Y) = B₀ + B₁*x₁ + B₂*x₂ where x₁ and x₂ can take many forms, such as x² or log(x)

E(Y) = B₀ + B₁*x + B₂*x² is a linear statistical model where E(Y) is a 2nd order polynomial function of the independent variable "x", with x₁ = x and x₂ = x².

E(Y) = B₀ + B₁*log(x) is also a linear model.

An example of "not linear in the betas":

E(Y) = B₀ +B₁e^(B2*x)

An important difference of nonlinear models (Kutner chapter 13 p. 513) ...

"... is that the number of regression parameters is not necessarily directly related to the number of the explanatory (X) variables in the model. In linear regression models, if there are "p-1" explanatory (X) variables in the model, then there are "p" regression coefficients in the model. For the exponential regression model above, there is one explanatory variable but there are three regression coefficients."

Another very important aspect of "non-linearity" is that any transform that might be needed does not affect the error terms (e_i terms) and their assumptions of independent, identically distributed (iid).

(See StatSoft for other examples of intrinsically nonlinear models. Also see the NIST reference above.)

Report abuse