For today you should have:
Today:
Survey resultsGenerally positive, so that's good. Time spent: a little low. Bad things: not enough time for project, not enough structure for project, more help with probability. NINJAs: generally positive, some questions about homework feedback. Project suggestions1) Look at the next deadline and work backward. Where should you be? 2) Allocate 2 hours per week and schedule it. 3) Look at distributions and choose summary statistics. Counterexample: mean doubling time. 4) Look at scatterplots and identify relationships. 5) Design other visualizations. Warning: don't call something a {PMF, CDF, scatterplot} unless it is a {PMF, CDF, scatterplot}. For example, suppose I compute the probability of early birth as a function of mother's age. It maps from age to probability, but it is not a PMF. Suppose I compute average pregnancy length as a function of mother's age. It shows one variable versus another, but it is not a scatterplot. Identify apparent effects now; soon we will learn: 1) Test whether an apparent effect might be due to chance. 2) Estimate the size of the effect. 3) Quantify relationships. ConvolutionOperations on distributions tend to be easy to implement numerically, a little more challenging to do analytically. For example, here is the CDF of Z = X + Y:
def CdfSum(z, pmf_x, cdf_y): """Probability that Z = X + Y <= z.""" total = 0 for x, px in pmf_x.Items(): py = cdf_y.Prob(z-x) total += px * py return total
If X and Y are values drawn independently from exponential distributions with parameter λ, what is the distribution of their sum? And there is another example in Exercise 6.7: In Homework 6, you generalize this to Z=max(X, X) and Z=min(X, X). There are three general strategies: 1) Simulation. 2) Discrete computation. 3) Symbolic analysis. What are the pros and cons? In class exercise: what is the distribution of Z = XY? Normal distributions |
Lecture notes >