Data Distributions

What is Normality?

Refers to the shape of data distribution for individual variable.
Normality can have serious effects in small samples (<50), but the impact effectively diminishes when sample sizes > 200.
In parametric tests, if variation from normal distribution is large, all resulting statistical tests are invalid, because F & t-statistics assume normality (Hair et al., 2010).

How to Test for Normality?

Histogram: Compare the observed data values with a distribution approximating normal distribution.
Normal Probability Plot: Compare the cumulative distribution of actual data values with the cumulative distribution of a normal distribution.
Skewness and Kurtosis Statistics.
Shapiro-Wilks (sample < 2000).
Kolmogorov-Smirnov (sample > 2000).

Skewness

The degree of symmetry in the variable distribution.

Threshold: -2 ≤ skewness ≤ 2 (Curran et al., 1996; West et al., 1995; Gliselli et al., 1981).

Negatively Skewed

Skewness < 0

Normal: No Skew

Perfectly symmetrical distribution

Skewness = 0

Positively Skewed

Skewness > 0

Kurtosis

The degree peakedness/flatness in the variable distribution.

Threshold: -7 ≤ Kurtosis ≤ 7 (Curran et al., 1996; West et al., 1995).

Platykurtic distribution

Low degree of peakness

Kurtosis < 0

Mesokurtic Distribution

Normal distribution

Kurtosis = 0

Leptokurtic Distribution

High degree of peakness

Kurtosis > 0

Multivariate Normality

The multivariate normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. Hair et al., (2016) stated, "When multivariate data are analyzed, the multivariate normal model is the most commonly used model."

To test for multivariate normality, please click here.

The expected Mardia’s skewness is 0 for a multivariate normal distribution and higher values indicate a more severe departure from normality.According to Bentler (2005) and Byrne (2010), the critical ratio value of multivariate kurtosis should be less than 5.0 to indicate a multivariate normal distribution.

Data is extremely not normal, how to remedy?

Check and remove outlier cases.
Remove non-normal item from the model.
Bootstrapping (i.e., re-sampling process in the existing data-set with replacement).