Heteroscedasticity

The main idea...

Heteroscedasticity, meaning "differing dispersion", occurs when the variability of a random variable is correlated to the magnitude of the variable (i.e. the size of its values), conditional on some other variable (Figure 1). This violates the assumption of equal variance of residuals held by most linear hypothesis-testing methods and renders many significance tests and confidence interval estimations invalid. It also reduces the "efficiency" of, for example, ordinary-least-squares (OLS) approaches to regression. This means that methods such as OLS will not effectively minimise the variance between the values of the regressand and regressor when establishing a fit.

Figure 1: Illustrations of a) homoscedastic and b) heteroscedastic  data. The variability of Y conditional on X is dependent on the magnitude of X and Y in panel b. If heteroscedasticity occurs in a collection of variables, analyses that depend on variability (variance, standard deviation, etc.) being uncorrelated with the magnitude of a given variable will be invalidated. This is particularly harmful for many significance-testing methods.

Testing for heteroscedasticity

Correcting for heteroscedasticity

Ignoring heteroscedasticity

Ignoring heteroscedasticity may result in less precise (yet still unbiased) parameter estimates under OLS approaches. Establishing how imprecise the parameter estimates are may be challenging, however. Standard covariance matrix estimators (in contrast to those from HCCMs) resulting from OLS approaches do become biased and will affect any method referencing them. Error estimation from the residuals of a regression on heteroscedastic data will almost certainly be incorrect and any hypothesis tests referencing them will be invalid.

References

Implementations