Lecture 09

For today you should:

1) Read Chapter 9 of Think Stats 2e, sections 9.1 to 9.7

2) Complete Exploring relationships

Today:

1) Hypothesis testing

For next time you should:

1) Read the rest of Chapter 9 of Think Stats 2e

2) Prepare for a quiz on Chapters 8 and 9, primarily

3) Start the Estimation section of your journal (see below)

Estimation

For the next section of your journal, you should:

1) Choose a variable.

2) Choose a quantity to estimate.

3) Compute the sampling distribution of that estimate by simulation.

4) Compute the standard error and 90% CI for that estimate.

For now this one is mostly an exercise and you can keep it simple.  We will cycle back to this section later when you have a better idea what the important quantities are to estimate.

Hypothesis testing

Many scientific results are presented in a format like this: "The difference in mean pregnancy length is 0.078 weeks (SE 0.025, CI 0.03-0.13, p=0.012)."

In a concise form, this is the answer to three questions:

1) Based on the sample you actually collected, what do you think the answer is?

2) If you ran this experiment again, how much would the results vary due to random sampling?

3) Is it likely that the apparent effect is solely due to random sampling? 

These questions are in decreasing order of importance:

1) The effect size is by far the most important thing.  By far.

2) SE and CI are useful to give a sense of how precise the estimate is.

3) The p-value is a box to check off to see if we might be getting fooled by chance.

Getting fooled by chance is a real problem, and computing p-values can help you avoid being embarrassed, but a p-value alone is not a meaningful or useful result.

[Space here for anecdotes I don't want to put in writing.]

For more on this topic:

Sullivan and Feinn, "Using Effect Size—or Why the P Value Is Not Enough"

However, this sentence is not correct, "Statistical significance is the probability that the observed difference between two groups is due to chance."

The null hypothesis

1) Hey, there seems to be a difference of 𝛿* between these groups.

2) What if there were actually no difference, how would I model that?

3) Using my model of the no-difference scenario, what would be the probability of seeing a difference as big as 𝛿* by chance?

4) If that probability is small, I conclude that the apparent effect is unlikely to be due to chance.

Step (3) is an actual probability.  Step (4) is a subjective conclusion.

How small is small?

Note: modeling the null hypothesis involves modeling decisions, which are subjective.  So there is no uniquely correct p-value for any non-trivial scenario.

Most p-values are very coarse estimates, with no significant digits of precision, only an order of magnitude.

The HypothesisTest framework

Many hypothesis tests are based on the same computation framework, represented by the HypothesisTest object:

class HypothesisTest(object):

    def __init__(self, data):

        self.data = data

        self.MakeModel()

        self.actual = self.TestStatistic(data)

    def PValue(self, iters=1000):

        self.test_stats =

                                self.TestStatistic(self.RunModel()) 

              for _ in range(iters)]

        count = sum(1 for x in self.test_stats 

                    if x >= self.actual)

        return count / iters

    def TestStatistic(self, data):

        raise UnimplementedMethodException()

    def MakeModel(self):

        pass

    def RunModel(self):

        raise UnimplementedMethodException()

Child classes must provide TestStatistic and RunModel, and may provide MakeModel.

Example: testing a difference in means by permutation.

What is the implicit model of the null hypothesis?

class DiffMeansPermute(thinkstats2.HypothesisTest):

    def TestStatistic(self, data):

        group1, group2 = data

        test_stat = abs(group1.mean() - group2.mean())

        return test_stat

    def MakeModel(self):

        group1, group2 = self.data

        self.n, self.m = len(group1), len(group2)

        self.pool = np.hstack((group1, group2))

    def RunModel(self):

        np.random.shuffle(self.pool)

        data = self.pool[:self.n], self.pool[self.n:]

        return data

Exercise

1) git pull upstream master

2) Run chap09ex.ipynb

3) Do exercise 9.2, which is included in the notebook, along with test code.