statrefs home‎ > ‎Main‎ > ‎Fundamentals‎ > ‎Model Assumptions‎ > ‎

### Model Assumptions for Specific Methods

The following list offers, for a few specific statistical methods, a list of assumptions and related diagnostic tests that could be used to check the validity of the assumptions.  Alternative methods are also suggested for when a model assumption is violated.

This list is not intended to be comprehensive.  Similar information is offered on other pages on this site that discuss specific statistical methods.

## `Independent Samples t-Test`

 Assumptions Diagnostics Alternative Method Equal variances Levene’s test Nonparametric methodTest for difference in mediansMann-Whitney U-test Balanced sample sizes Descriptive statistics showing sample sizes allocated to each group or strata in the final data set. Sample data are randomly drawn from normally distributed populations Normal probability plot (graphical, subjective) Shapiro-Wilks test (quantitative, objective) or equivalent Nonparametric methodSee above.

## `Regression`

 Assumptions Diagnostics Alternative Method Residuals are iid N(0,σ2)   (“iid” is explained below) See references on residual diagnostics   Includes table of recommended extended residual diagnostics, along with graphical methods Independent   Observations (and thus residuals) are independent; one observation does not influence another Check for autocorrelation using ACF and PACF   NOTE:  This assumption is often not checked. For many of the types of experiments and regression studies performed at our company, this may be a relatively safe assumption and diagnostics may be omitted. However, if a statistician asks for a test of independence, it shall be performed with results included in the report. Time Series methods Identically Distributed Mean of residuals should be constant at zero conditionally over all values of explanatory variable(s) Scatterplot of residuals vs. each explanatory variable Variance of residuals should be constant conditionally over all values of explanatory variable(s) Levene’s test (or equivalent) (assumes fixed Xi)Plot of Variances vs. Means (assumes fixed Xi)Check for patterns in the residuals, such as a funneling effect over Xi.  See Variance Stabilizing Transform and Regression Model Heteroskedasticity. If the mean of the residuals is not constant at zero, and if it exhibits a pattern, then the model may benefit from additional terms.  For example, if the residual means exhibit a quadratic curvature, then adding an X2 term may be explored.If the variance of the model residuals is not constant and if it displays a pattern such as increasing or decreasing variance, then a variance stabilizing transform may be useful.  Weighted least squares might also be considered. N(0,σ2) Building on “identically distributed”, and when calculated conditionally over all values of an explanatory variable: the residuals are normally distributed with mean equal to zero with constant variance  See diagnostics for “iid” above. See above.

## `ANOVA`

 Assumptions Diagnostics Alternative Method See “Regression” Normality – for reference on robustness, see Hicks page 84. Kruskal-Wallis test (for one-way ANOVA)Friedman test Treatments are balanced For DOE, ensured by the study design   For observational studies, checked using descriptive statistics (sample size allocation)   Robustness – see notes under equal variances. For studies that are not balanced, Least Squares methods are less effective and other methods, such as Maximum Likelihood, may need to be considered. Variances are equal Levene’s test   Plot of Variances vs. Means   Robustness – see Hicks page 86.  “ANOVA results are particularly sensitive to unequal variances when sample sizes differ substantially.” Kruskal-Wallis testFriedman test

## `Design of Experiments (DOE)`

 Assumptions Diagnostics Alternative Method The process is stable SPC charting prior to or during the data collection Building in DOE replicates and centerpoints None. There are no methods that can adequately compensate for the lack of a stable process when using DOE methods. DOE generally will not work well with a process that has erratic, unpredictable behavior. The experiment is randomized Ensured by the design and execution of the study The experiment is orthogonal The study is designed so that the vectors representing the main effects are independent from each other This is affected by the design of the study.It is also affected by the balance of the study. Residuals are iid N(0,σ2) See notes under “Regression”

## `Process Capability - Cpk`

 Assumptions Diagnostics Alternative Method Process is in a state of statistical control SPC charting required X-bar and R (or equivalent)   NOTE: Cpk is usually NOT a valid measure if the process is not shown to be in a state of statistical control IndependentThe measurements in the various subgroups are independent from one another Check for autocorrelation using ACF and PACF   NOTE:  This assumption is often not checked. For many of the types of experiments and regression studies performed at our company, this may be a relatively safe assumption and diagnostics may be omitted. However, if a statistician asks for a test of independence, it shall be performed with results included in the report. Time Series methods(Includes EWMA, but EWMA may not always be the best choice, depending on the nature of the autocorrelation as indicated by the ACF and PACF)

## `Process Performance - Ppk`

 Assumptions Diagnostics Alternative Method Process data is normally distributed Normal probability plot (graphical) Shapiro-Wilks test (quantitative) or equivalent Non-normal Ppk Nonparametric tolerance interval

Comments