Lecture notes‎ > ‎

Lecture 08

For today you should have:

  1. Read Chapter 6.
  2. Prepare for a quiz on Chapters 1-5.
  3. Midpoint survey.


  1. Quiz.
  2. Survey.
  3. Project suggestions.
  4. Convolution.
  5. Homework 6 preview.

For next time:

  1. Homework 6.
  2. Read Chapter 7.
  3. Optional: Read about illusory superiority.

Survey results

Generally positive, so that's good.

Time spent: a little low.

Bad things: not enough time for project, not enough structure for project, more help with probability.

NINJAs: generally positive, some questions about homework feedback.

Project suggestions

1) Look at the next deadline and work backward.  Where should you be?

2) Allocate 2 hours per week and schedule it.

3) Look at distributions and choose summary statistics.

Counterexample: mean doubling time.

4) Look at scatterplots and identify relationships.

5) Design other visualizations.

Warning: don't call something a {PMF, CDF, scatterplot} unless it is a {PMF, CDF, scatterplot}.

For example, suppose I compute the probability of early birth as a function of mother's age.  It maps from age to probability, but it is not a PMF.

Suppose I compute average pregnancy length as a function of mother's age.  It shows one variable versus another, but it is not a scatterplot.

Identify apparent effects now; soon we will learn:

1) Test whether an apparent effect might be due to chance.

2) Estimate the size of the effect.

3) Quantify relationships.


Operations on distributions tend to be easy to implement numerically, a little more challenging to do analytically.  For example, here is the CDF of Z =  X + Y:

def CdfSum(z, pmf_x, cdf_y):
    """Probability that Z = X + Y <= z."""
    total = 0
    for x, px in pmf_x.Items():
        py = cdf_y.Prob(z-x)
        total += px * py
    return total

Chapter 6 shows the continuous version, and an example:

If X and Y are values drawn independently from exponential distributions with parameter λ, what is the distribution of their sum?

And there is another example in Exercise 6.7:

If X ~ Expo(lambda) and Y ~ Erlang(k, lambda), what is the distribution of Z = X + Y?

In Homework 6, you generalize this to Z=max(X, X) and Z=min(X, X).

There are three general strategies:

1) Simulation.

2) Discrete computation.

3) Symbolic analysis.

What are the pros and cons?

In class exercise: what is the distribution of Z = XY?

Normal distributions

Exercise 6.10: If X ∼ N (µX, σX2) and Y ∼ N (µY, σY2), what is the distribution of Z = aX + bY?