For today you should have:
Today:
For next time:
OddsOdds are an alternative way of expressing probabilities. o = p / (1-p) We usually write odds with two integers, for:against. For example, p = 0.25 corresponds to 1:3 odds (in favor). Some calculations are easier using odds rather than probabilities. Example: let's do the Elvis problem with probabilities and odds. The prior odds are 1:11.5 The posterior odds are 2:11.5 The posterior probability is 0.148 Relative RiskRisks: ProbEarly live births 0.175120244862 ProbEarly first babies 0.182415590301 ProbEarly others 0.168321013728 ProbOnTime live births 0.701355487538 ProbOnTime first babies 0.662134602311 ProbOnTime others 0.737909186906 ProbLate live births 0.123524267599 ProbLate first babies 0.155449807387 ProbLate others 0.0937697993664 Risk ratios (first babies / others): ProbEarly 1.08373628617 ProbOnTime 0.897311775027 ProbLate 1.65778116662 What is the relative risk of being born early for first babies, relative to others? When is relative risk a good choice of summary statistic? What are some alternatives? Last year I ran the James Joyce Ramble 10K in Dedham MA. The results are available here. Go to that page and find my results. What is my percentile rank in the field (all runners)? What is my percentile rank in my division (M4049 means ``male between 40 and 49 years of age'')? What division are you in? How fast do you have to run a 10K to ``beat'' me in terms of percentile ranks? Can you do it? CDFsStrictly speaking, when you compute a cumulative PMF, the result is a CMF, but no one calls it that. Key ideas: 1) Map from values to percentile ranks (or probabilities). 2) Reverse map from percentile ranks to values. 3) My implementation is a sorted list of values and a sorted list of probabilities. See Cdf.html and Cdf.py Both use bisect, so they take log n time. Tricky to get the corner cases right. def Prob(self, x): """Returns CDF(x), the probability that corresponds to value x. Args: x: number Returns: float probability """ if x < self.xs[0]: return 0.0 index = bisect.bisect(self.xs, x) p = self.ps[index-1] return p def Value(self, p): """Returns InverseCDF(p), the value that corresponds to probability p. Args: p: number in the range [0, 1] Returns: number value """ if p < 0 or p > 1: raise ValueError('Probability p must be in range [0, 1]') if p == 0: return self.xs[0] if p == 1: return self.xs[-1] index = bisect.bisect(self.ps, p) if p == self.ps[index-1]: return self.xs[index-1] else: return self.xs[index] 3) To iterate through the items in a CDF use Items(). 4) For plotting, especially small n, it is important to draw a step function. That's what Render() is for. 5) The CDF is the integral (sum) of the PMF; the PMF is the derivative (diff) of the CDF. Let's sketch some CDFs. Another Bayes's Theorem ProblemAccording to the CDC, ``Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.'' If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker? |
Lecture notes >