Lecture notes‎ > ‎

Lecture 08

For today you should have:

  1. Read Chapter 6.
  2. Prepare for a quiz on Chapters 1-5.
  3. Midpoint survey.

Today:

  1. Quiz.
  2. Survey.
  3. Project suggestions.
  4. Convolution.
  5. Homework 6 preview.

For next time:

  1. Homework 6.
  2. Read Chapter 7.
  3. Optional: Read about illusory superiority.


Survey results

Generally positive, so that's good.

Time spent: a little low.

Bad things: not enough time for project, not enough structure for project, more help with probability.

NINJAs: generally positive, some questions about homework feedback.


Project suggestions

1) Look at the next deadline and work backward.  Where should you be?

2) Allocate 2 hours per week and schedule it.

3) Look at distributions and choose summary statistics.

Counterexample: mean doubling time.

4) Look at scatterplots and identify relationships.

5) Design other visualizations.

Warning: don't call something a {PMF, CDF, scatterplot} unless it is a {PMF, CDF, scatterplot}.

For example, suppose I compute the probability of early birth as a function of mother's age.  It maps from age to probability, but it is not a PMF.

Suppose I compute average pregnancy length as a function of mother's age.  It shows one variable versus another, but it is not a scatterplot.

Identify apparent effects now; soon we will learn:

1) Test whether an apparent effect might be due to chance.

2) Estimate the size of the effect.

3) Quantify relationships.


Convolution


Operations on distributions tend to be easy to implement numerically, a little more challenging to do analytically.  For example, here is the CDF of Z =  X + Y:

def CdfSum(z, pmf_x, cdf_y):
    """Probability that Z = X + Y <= z."""
    total = 0
    for x, px in pmf_x.Items():
        py = cdf_y.Prob(z-x)
        total += px * py
    return total

Chapter 6 shows the continuous version, and an example:

If X and Y are values drawn independently from exponential distributions with parameter λ, what is the distribution of their sum?

And there is another example in Exercise 6.7:

If X ~ Expo(lambda) and Y ~ Erlang(k, lambda), what is the distribution of Z = X + Y?


In Homework 6, you generalize this to Z=max(X, X) and Z=min(X, X).

There are three general strategies:

1) Simulation.

2) Discrete computation.

3) Symbolic analysis.

What are the pros and cons?

In class exercise: what is the distribution of Z = XY?

Normal distributions

Exercise 6.10: If X ∼ N (µX, σX2) and Y ∼ N (µY, σY2), what is the distribution of Z = aX + bY?


Comments