Lecture 09
For today you should:
1) Read Chapter 9 of Think Stats 2e, sections 9.1 to 9.7
2) Complete Exploring relationships
Today:
1) Hypothesis testing
For next time you should:
1) Read the rest of Chapter 9 of Think Stats 2e
2) Prepare for a quiz on Chapters 8 and 9, primarily
3) Start the Estimation section of your journal (see below)
Estimation
For the next section of your journal, you should:
1) Choose a variable.
2) Choose a quantity to estimate.
3) Compute the sampling distribution of that estimate by simulation.
4) Compute the standard error and 90% CI for that estimate.
For now this one is mostly an exercise and you can keep it simple. We will cycle back to this section later when you have a better idea what the important quantities are to estimate.
Hypothesis testing
Many scientific results are presented in a format like this: "The difference in mean pregnancy length is 0.078 weeks (SE 0.025, CI 0.03-0.13, p=0.012)."
In a concise form, this is the answer to three questions:
1) Based on the sample you actually collected, what do you think the answer is?
2) If you ran this experiment again, how much would the results vary due to random sampling?
3) Is it likely that the apparent effect is solely due to random sampling?
These questions are in decreasing order of importance:
1) The effect size is by far the most important thing. By far.
2) SE and CI are useful to give a sense of how precise the estimate is.
3) The p-value is a box to check off to see if we might be getting fooled by chance.
Getting fooled by chance is a real problem, and computing p-values can help you avoid being embarrassed, but a p-value alone is not a meaningful or useful result.
[Space here for anecdotes I don't want to put in writing.]
For more on this topic:
Sullivan and Feinn, "Using Effect Size—or Why the P Value Is Not Enough"
However, this sentence is not correct, "Statistical significance is the probability that the observed difference between two groups is due to chance."
The null hypothesis
1) Hey, there seems to be a difference of 𝛿* between these groups.
2) What if there were actually no difference, how would I model that?
3) Using my model of the no-difference scenario, what would be the probability of seeing a difference as big as 𝛿* by chance?
4) If that probability is small, I conclude that the apparent effect is unlikely to be due to chance.
Step (3) is an actual probability. Step (4) is a subjective conclusion.
How small is small?
<1% statistically significant
>10% possibly due to chance
between 1 and 10, borderline
Note: modeling the null hypothesis involves modeling decisions, which are subjective. So there is no uniquely correct p-value for any non-trivial scenario.
Most p-values are very coarse estimates, with no significant digits of precision, only an order of magnitude.
The HypothesisTest framework
Many hypothesis tests are based on the same computation framework, represented by the HypothesisTest object:
class HypothesisTest(object):
def __init__(self, data):
self.data = data
self.MakeModel()
self.actual = self.TestStatistic(data)
def PValue(self, iters=1000):
self.test_stats = [
self.TestStatistic(self.RunModel())
for _ in range(iters)]
count = sum(1 for x in self.test_stats
if x >= self.actual)
return count / iters
def TestStatistic(self, data):
raise UnimplementedMethodException()
def MakeModel(self):
pass
def RunModel(self):
raise UnimplementedMethodException()
Child classes must provide TestStatistic and RunModel, and may provide MakeModel.
Example: testing a difference in means by permutation.
What is the implicit model of the null hypothesis?
class DiffMeansPermute(thinkstats2.HypothesisTest):
def TestStatistic(self, data):
group1, group2 = data
test_stat = abs(group1.mean() - group2.mean())
return test_stat
def MakeModel(self):
group1, group2 = self.data
self.n, self.m = len(group1), len(group2)
self.pool = np.hstack((group1, group2))
def RunModel(self):
np.random.shuffle(self.pool)
data = self.pool[:self.n], self.pool[self.n:]
return data
Exercise
1) git pull upstream master
2) Run chap09ex.ipynb
3) Do exercise 9.2, which is included in the notebook, along with test code.