Our goals:
Use the sampling method to explain the central limit theorem.
single samples of the boxes will give a large sd (exactly as large as what the sd for the whole population is)
samples of averages will have a lower sd as they will balance each other out
samples of data will give a normal curve EVEN IF the data was not normal to begin with, with means of both being equal
use the newt data to find a mean and standard deviation.
use this to get a 68-95-99.7 percentage
this represents the chances that our population mean is within the thresholds using our sample mean
section 5.1 summary is very small (page 308), but it is very important.
Some newt data (from The Basic Practice of Statistics, by David Moore):
Biologists studying the healing of skin wounds measured at the rate at which new cells closed a razor cut made in the skin of an anesthetized newt. Here are the data from 18 newts, measured in micrometers (millionths of a meter) per hour:
29 27 34 40 22 28 14 35 26 35 12 30 23 18 11 22 23 33
We know from previous experiments that the sd for newts healing is 8 micrometers per hour.
Graph break: what's wrong with these graphs? Why are they bad? (see graphs in class)
Understand where the binomial distribution comes from.
flip some coins and analyze the outcome
use these flips to represent yes/no options.
N(12, 6) as compared to B(3, 0.25)
dbinom(value to find, number of trials, chances of success)
example: what are the chances of getting five heads when flipping a coin 8 times?
dbinom(5, 8, 0.5)
from the book: 10 or less misclassified in a sample of 150:
sum(dbinom(0:10,150,0.08))
plot(dbinom(0:15,15,0.08))
Look at an example of a survey given over the internets (xkcd.com)
http://blog.xkcd.com/2010/05/03/color-survey-results/
Is this a good visualization?
http://www.informationisbeautiful.net/visualizations/colours-in-cultures/