Lecture 18

For today you should:

1) Start your preliminary report.

Today:

1) Quiz 7 Debrief

2) A little more Bayes

For next time you should:

1) Work on your preliminary report.

2) Prepare for a quiz.

Quiz 7

1) Danger: the p-value is NOT the probability that the apparent effect is due to chance.  See below.

2) Covariance and correlation both quantify the tendency of the variables to vary together.  The difference is that correlation is standardized, so it has no units, and can be compared across pairs of variables.

Covariance  is used as an intermediate value for some analysis, but is seldom reported as a summary statistic.

3) Key pandas operations

row filtering:

male = df[df.sex == 'M']

column selection

male.pre3

male['pre3']

4) A peak in the ACF indicates periodic behavior in the time series, not necessarily a peak in the time series.

And the units of lag are important.

Bayesian interpretation of hypothesis testing

In my previous article I was surprised to find myself defending classical statistical inference, including null-hypothesis significance testing (NHST).  I wrote:

“If the p-value is small, you can conclude that the fourth possibility is unlikely, and infer that the other three possibilities are more likely.”

Where “the fourth possibility” I referred to was:

“The apparent effect might be due to chance; that is, the difference might appear in a random sample, but not in the general population.”

Several commenters chastised me; for example:

“All a p-value tells you is the probability of your data (or data more extreme) given the null is true. They say nothing about whether your data is ‘due to chance.’”

My correspondent is partly right.  The p-value is the probability of the apparent effect under the null hypothesis, which is not the same thing as the probability we should assign to the null hypothesis after seeing the data.  But they are related to each other by Bayes’s theorem, which is why I said that you can use the first to make an inference about the second.

Let me explain by showing a few examples.  I’ll start with what I think is a common scenario:  suppose you are reading a scientific paper that reports a new relationship between two variables.  There are (at least) three explanations you might consider for this apparent effect:

A:  The effect might be actual; that is, it might exist in the unobserved population as well as the observed sample.

B:  The effect might be bogus, caused by errors like sampling bias and measurement error, or by fraud.

C:  The effect might be due to chance, appearing randomly in the sample but not in the population.

If we think of these as competing hypotheses to explain the apparent effect, and we can use Bayes’s theorem to update our belief in each hypothesis.

Scenario #1

As a first scenario, suppose that the apparent effect is plausible, and the p-value is low.  The following table shows what the Bayesian update might look like:

Since the apparent effect is plausible, I give it a prior probability of 70%.  I assign a prior of 20% to the hypothesis that the result is to due error or fraud, and 10% to the hypothesis that it’s due to chance.  If you don’t agree with the numbers I chose, we’ll look at some alternatives soon.  Or feel free to plug in your own.

Now we compute the likelihood of the apparent effect under each hypothesis.  If the effect is real, it is quite likely to appear in the sample, so I figure the likelihood is between 50% and 100%.   And in the presence of error or fraud, I assume the apparent effect is also quite likely.

If the effect is due to chance, we can compute the likelihood directly.  The likelihood of the data under the null hypothesis is the p-value.  As an example, suppose p=0.01.

The table shows the resulting posterior probabilities.  The hypothesis that the effect is due to chance has been effectively eliminated, and the other hypotheses are marginally more likely.

Scenario #2

In the second scenario, suppose the p-value is low, again, but the apparent effect is less plausible.  In that case, I would assign a lower prior probability to Actual and a higher prior to Bogus.  I am still inclined to assign a low priority to Chance, simply because I don’t think it is the most common cause of scientific error.

The results are pretty much the same as in Scenario #1: we can be reasonably confident that the result is not due to chance.

I believe these examples cover a large majority of real-world scenarios, and in each case my claim holds up:  If the p-value is small, you can conclude that apparent effect is unlikely to be due to chance, and the other possibilities (Actual and Bogus) are more likely.

I admit that there are situations where this conclusion would not be valid.  For example, if the effect is implausible and you have good reason to think that error and fraud are unlikely, you might start with a larger prior belief in Chance.  In that case a p-value like 0.01 might not be sufficient to rule out Chance:

But even in this contrived scenario, the p-value has a substantial effect on our belief in the Chance hypothesis.  And a somewhat smaller p-value, like 0.001, would be sufficient to effectively rule out Chance.

In summary, NHST is problematic but not useless.  If you think an apparent effect might be due to chance, choosing an appropriate null hypothesis and computing a p-value is a reasonable way to check.