Lecture notes‎ > ‎

Lecture 10

For today you should have:

  1. Homework 7.
  2. Draft report.

Today:

  1. Other operations on distributions.
  2. Hypothesis testing.

For next time:

  1. Read Chapter 8.
  2. Prepare for a quiz.
  3. Check out this nice figure with a regrettable URL:  http://nicefigure.org/post/11098703970/on-visualizing-the-predictive-accuracy-of-a-breast


Max and Min


Suppose I draw two values from a distribution; what is the distribution of the larger value?  

As we saw with convolution, we have three options:

1) Simulation: draw random samples and make a PMF.

2) Enumeration: enumerate all pairs of values.

3) Analysis: let's do some math!


Given Z = max(X, Y), we want CDF_Z, which is Pr{Z <= z}.

Z <= z implies X <=z AND Y <=z

So what's Pr{X <=z AND Y <=z}?

If X and Y are independent, Pr{X <=z} * Pr{Y <= z}.

So by the definition of CDF:

CDF_Z(z) = CDF_X(z) * CDF_Y(z)

In the special case where X=Y,

CDF_Z(z) = CDF_X(z)2

What about Z = max(X1, X2... Xn)?

Write a function that takes a CDF and a number, n, and computes the CDF of the max of n variates from the CDF.

Is this better than simulation and enumeration?

What about Z = min(X, Y)?

Hint: use CDFs and think about the definition of the CDF.


Hypothesis testing


Resources:

1) Chapter 7.

2) Blog articles:




Take home messages:

1) A p-value is the probability of seeing such a big effect by chance, which is P(E | H0)

2) Computing p-values requires a model of the null hypothesis, which involves simplifying assumptions.

3) And you have to choose a test statistic, which is often arbitrary.

4) Estimated p-values are usually on the right order of magnitude, and that's about it.

5) If the p-value is low, then the effect is unlikely to be due to chance, which means that other explanations are more likely.

6) To say how much more likely, we also need  P(E | HA), which is only meaningful if we have a well-defined HA.


Power


If the p-value is high, then the apparent effect might be due to chance, so there is little support for HA.  Now what?

Natural followup question: if there were an effect, what would be the chance of seeing it?

The answer depends on the size of the effect, so power is sometimes expressed as a function of delta.

We can estimate power by simulation, but it might be slow.

Useful for experimental design, especially choosing sample size.

Practice quiz


1) Write a function called CdfMax that takes two Cdfs as parameters and returns a new Cdf that represents the distribution of the larger value drawn from the given Cdfs.

Your code should take advantage of the analysis (above), so it should not generate a sample or enumerate pairs of values.


2) Suppose you are on Let's Make a Deal and you are playing the game described in the book, with one difference: before you went on the show you analyzed tapes of previous shows and discovered that Monty has a tell: when the contestant picks the correct door, Monty is more likely to blink.

Quoth Wikipedia: ``A tell in poker is a subtle but detectable change in a player's behavior or demeanor that gives clues to that player's assessment of his hand.''

Specifically, of the 15 shows you watched, the contestant chose the correct door 5 times, and Monty blinked three of those times.  Of the other 10 times, Monty blinked three times.

As usual, let's assume that you choose Door A.  Monty opens door B and blinks.  What should you do, and what is your chance of winning?

For full credit, you should state the hypotheses and evidence clearly, and compute each of the terms in Bayes's theorem.


3) In Homework 7, you tested the hypothesis that first babies are lighter than other babies, so the null hypothesis is that the distribution for all babies is the same, regardless of birth order.

But that's not a specific enough statement of the null hypothesis to simulate; we have to specify the hypothetical distribution of birth weights.  Describe two different reasonable ways you might choose to model this distribution.

Subpages (1): Practice quiz solutions
Comments