statrefs home‎ > ‎Main‎ > ‎Fundamentals‎ > ‎Model Assumptions‎ > ‎

Model Assumptions for Specific Methods


 
The following list offers, for a few specific statistical methods, a list of assumptions and related diagnostic tests that could be used to check the validity of the assumptions.  Alternative methods are also suggested for when a model assumption is violated.

This list is not intended to be comprehensive.  Similar information is offered on other pages on this site that discuss specific statistical methods.






Independent Samples t-Test



Assumptions

Diagnostics

Alternative Method

Equal variances

 

Levene’s test

Nonparametric method

  • Test for difference in medians
  • Mann-Whitney U-test

Balanced sample sizes

Descriptive statistics showing sample sizes allocated to each group or strata in the final data set.

 

Sample data are randomly drawn from normally distributed populations

Normal probability plot (graphical, subjective)

Shapiro-Wilks test (quantitative, objective) or equivalent

Nonparametric method

  • See above.



Regression



Assumptions

Diagnostics

Alternative Method

Residuals are iid N(0,σ2)

 

(“iid” is explained below)

See references on residual diagnostics

 

Includes table of recommended extended residual diagnostics, along with graphical methods

 

Independent

 

Observations (and thus residuals) are independent; one observation does not influence another

Check for autocorrelation using ACF and PACF

 

NOTE: 

This assumption is often not checked.

For many of the types of experiments and regression studies performed at our company, this may be a relatively safe assumption and diagnostics may be omitted.

However, if a statistician asks for a test of independence, it shall be performed with results included in the report.

Time Series methods

Identically Distributed


Mean of residuals should be constant at zero conditionally over all values of explanatory variable(s)
  • Scatterplot of residuals vs. each explanatory variable



Variance of residuals should be constant conditionally over all values of explanatory variable(s)
  • Levene’s test (or equivalent) (assumes fixed Xi)
  • Plot of Variances vs. Means (assumes fixed Xi)

If the mean of the residuals is not constant at zero, and if it exhibits a pattern, then the model may benefit from additional terms.  For example, if the residual means exhibit a quadratic curvature, then adding an X2 term may be explored.


If the variance of the model residuals is not constant and if it displays a pattern such as increasing or decreasing variance, then a variance stabilizing transform may be useful.  Weighted least squares might also be considered.

N(0,σ2)

Building on “identically distributed”, and when calculated conditionally over all values of an explanatory variable:

  • the residuals are normally distributed
  • with mean equal to zero
  • with constant variance

 

See diagnostics for “iid” above.

See above.


 


ANOVA




Assumptions

Diagnostics

Alternative Method

See “Regression”

Normality – for reference on robustness, see Hicks page 84.

Kruskal-Wallis test (for one-way ANOVA)

Friedman test

Treatments are balanced

For DOE, ensured by the study design

 

For observational studies, checked using descriptive statistics (sample size allocation)

 

Robustness – see notes under equal variances.

For studies that are not balanced, Least Squares methods are less effective and other methods, such as Maximum Likelihood, may need to be considered.

Variances are equal

Levene’s test

 

Plot of Variances vs. Means

 

Robustness – see Hicks page 86.  “ANOVA results are particularly sensitive to unequal variances when sample sizes differ substantially.”

Kruskal-Wallis test

Friedman test


 


Design of Experiments (DOE)




Assumptions

Diagnostics

Alternative Method

The process is stable

SPC charting prior to or during the data collection

Building in DOE replicates and centerpoints

None. There are no methods that can adequately compensate for the lack of a stable process when using DOE methods.

DOE generally will not work well with a process that has erratic, unpredictable behavior.

The experiment is randomized

Ensured by the design and execution of the study

 

The experiment is orthogonal

The study is designed so that the vectors representing the main effects are independent from each other

This is affected by the design of the study.

It is also affected by the balance of the study.

Residuals are iid N(0,σ2)

See notes under “Regression”

 






Process Capability - Cpk




Assumptions

Diagnostics

Alternative Method

Process is in a state of statistical control

SPC charting required

  • X-bar and R (or equivalent)

 

NOTE:

Cpk is usually NOT a valid measure if the process is not shown to be in a state of statistical control

 

Independent

The measurements in the various subgroups are independent from one another

Check for autocorrelation using ACF and PACF

 

NOTE: 

This assumption is often not checked.

For many of the types of experiments and regression studies performed at our company, this may be a relatively safe assumption and diagnostics may be omitted.

However, if a statistician asks for a test of independence, it shall be performed with results included in the report.

Time Series methods

(Includes EWMA, but EWMA may not always be the best choice, depending on the nature of the autocorrelation as indicated by the ACF and PACF)





Process Performance - Ppk



Assumptions

Diagnostics

Alternative Method


Process data is normally distributed

Normal probability plot (graphical)

Shapiro-Wilks test (quantitative) or equivalent

Non-normal Ppk

Nonparametric tolerance interval






Comments