## Probability and Statistics

Lecture notes‎ > ‎

### Lecture 3

For today you should have:
2. Finished Homework 1.
3. Worked on Homework 2, due Friday at noon.
Today:
1. Percentile rank.
2. Relative risk.
3. Conditional probability.
4. Practice quiz exercise.
5. CDFs.
For next time:
1. Prepare for a quiz.
2. Finish Homework 2.
3. Start Homework 3.

## Percentile Rank

I recently ran the James Joyce Ramble 10K in Dedham MA.  The results are available
here.  Go to that page and find my results.

What is my percentile rank in the field (all runners)?

What is my percentile rank in my division (M4049 means ``male between 40 and 49 years of age'')?

What division are you in?

How fast do you have to run a 10K to ``beat'' me in terms of percentile ranks?

Can you do it?

## Relative Risk

My solution to Exercise 2.8 is in http://thinkstats.com/risk.py.  Output:

Risks:
ProbEarly live births 0.175120244862
ProbEarly first babies 0.182415590301
ProbEarly others 0.168321013728
ProbOnTime live births 0.701355487538
ProbOnTime first babies 0.662134602311
ProbOnTime others 0.737909186906
ProbLate live births 0.123524267599
ProbLate first babies 0.155449807387
ProbLate others 0.0937697993664

Risk ratios (first babies / others):
ProbEarly 1.08373628617
ProbOnTime 0.897311775027
ProbLate 1.65778116662

What is the relative risk of being born early for first babies, relative to others?

When is relative risk a good choice of summary statistic?  What are some alternatives?

## Conditional probability

Sketch Python code for the algorithm in Section 2.9.

Practice Quiz question:

`Write a function called ``remaining_lifetime`` that takes a Pmf of lifespans and an age, ``a.  It should return the distribution of remaining lifetime for ``someone with age a.`

`For example, if the Pmf has values 1, 3 and 4, with equal probability, ``then the remaining lifetime of someone with a=2 is 1 or 2 with ``equal probability.`

### CDFs

Strictly speaking, when you compute a cumulative PMF, the result is a CMF, but no one calls it that.

Key ideas:

1) Map from values to percentile ranks (or probabilities).

2) Reverse map from percentile ranks to values.

3) My implementation is a sorted list of values and a sorted list of probabilities.

Both use bisect, so they take log n time.

Tricky to get the corner cases right.

`    def Prob(self, x):`
`        if x < self.xs[0]: return 0.0`
`        index = bisect.bisect(self.xs, x)`
`        p = self.ps[index-1]`
`        return p`

`    def Value(self, p):`
`        if p < 0 or p > 1:`
`            raise ValueError('Probability p must be in range [0, 1]')`

`        if p == 0: return self.xs[0]`
`        if p == 1: return self.xs[-1]`
`        index = bisect.bisect(self.ps, p)`
`        if p == self.ps[index-1]:`
`            return self.xs[index-1]`
`        else:`
`            return self.xs[index]`

3) To iterate through the items in a CDF use GetItems().

4) For plotting, especially small n, it is important to draw a step function.  That's what Render() is for.