anova (brief!)

Start-up question:

A study of people who refused to answer survey questions is shown below:

Does there appear to be a difference in whether or not people will refuse to answer a question when they are grouped? Run an analysis on this data. If there is a difference, identify where the biggest values are coming from.

What is ANOVA?

Let's look at our data for flower lengths again. (data is below in a slightly different format).

boxplot(flowers, horizontal=TRUE) # our standard! It does nothing!

boxplot(flowers$Length~flowers$Flower, horizontal=TRUE) # compare Length to name of flowers (Length is dependent on Flower)

They look different (h.bihai certainly looks longer than the rest...) but did we just pick bad samples of each?

....we could run t.tests on them all.

tedious
chance of results when there aren't any true results. (see relevant xkcd )

so we run a test FIRST.

ANOVA! ANALYSIS OF VARIANCE.

anova(lm(flowers$Length~flowers$Flower))

significance here means that we can continue, and start to pick apart the different data and see if there are major differences.

Car Rates:

in .csv form below. From: ( http://www.itl.nist.gov/div898/education/anova/newcar.dat )

boxplot(Car_rates$Rate~Car_rates$City)

summary(aov(Car_rates$Rate~Car_rates$City))

TukeyHSD((aov(Car_rates$Rate~Car_rates$City)))

plot(TukeyHSD((aov(Car_rates$Rate~Car_rates$City))))

Odors!

1-lavender

2-lemon

3-none

summary(aov(odors$Time~odors$Odor))

TukeyHSD(aov(odors$Time~odors$Odor)) #what's the problem? I was just getting the hang of this!

TukeyHSD(aov(odors$Time~as.factor(odors$Odor))) #let it know dummy variables are dummies.

Try this one on your own:

p. 647: 24.9