ANOVA-NonParameteric

Need to be able to tell if groups of data are the same or different??

Data not from a normal distribution??

Kruskal-Wallis is here to save the day!!!

The Kruskal-Wallis is a non-parametric (does not require data to be from a normal data distribution) ANOVA test. The null hypothesis is that the data sets are from the same populations. So if the resulting p-Value is greater than your limit, you can accept the null hypothesis and say they are all from the same distribution (aka there is no difference).

Lets do a practice test. Lets say we go out and randomly measure 15 men, then we measure 15 guys at the office and then we go by the local college basketball team and measure 15 of them and get the following highs in feet. NOTE: Here is a good page about ANOVA in R

From the Notched Box Plots we can see that it appears the Office Joes and Average Joes are from the same group and the Basketball Joes are not.

If we run Kruskal-Wallis for the Avg-Joes vs Office-Joes we get p-value = 0.2532 (25%) which is greater than the normal limits of 0.05 or 0.10 (5 and 10%) so we can say there is no difference between the Office Joes and the Average Joes.

If we run Kruskal-Wallis for the Avg-Joes vs Basketball-Joes we get a p-value= 4.054e-12 which means we can be

99.9999999996% certain that they are from two different groups.

Now lets think a little about this limit. Normal limits are set at 0.05 or 0.10. Now if we think of the Null hypothesis as being "No Difference / the site is Clean". If you go with a limit 0.10 there is a 10% chance you will say "Dirty" when the site was "Clean" aka false positive. So the higher you set your limit the more likely you are to say there is a difference when there isn't Call a Clean site Dirty. The lower you set your limit the the more likely you are to say there is no difference when there was Call a Dirty site Clean.

Here is some code to import a spreadsheet and do the calculations.

data <- read.table("Kruskal-Wallis.csv", header=TRUE, sep=",",)

# Run the test

kruskal.test(data)

# Done

#Here is code to run AvgJoe vs BasketballJoe

#set the dir to where you data is

setwd("c:/R")

#Load your data. The data is in a spreadsheet named KW-spreadsheet and we are going to call it "data" in R

data <- read.table("Kruskal-Wallis-AvgJoe-vs-BasketballJoe.csv", header=TRUE, sep=",",)

# Run the test

kruskal.test(data)

# Done

#Here is code to run AvgJoe vs OfficeJoe

#set the dir to where you data is

setwd("c:/R")

#Load your data. The data is in a spreadsheet named KW-spreadsheet and we are going to call it "data" in R

data <- read.table("Kruskal-Wallis-AvgJoe-vs-OfficeJoe.csv", header=TRUE, sep=",",)

# Run the test

kruskal.test(data)

# Done

#SIMPLE CODE

#set the dir to where you data is

setwd("c:/R")

#Load your data. The data is in a spreadsheet named KW-spreadsheet and we are going to call it "data" in R