Lecture 13

For today you should:

1) Read the rest of Chapter 11 of Think Stats 2e

2) Work on the regression journal entry.

Today:

1) Quiz review

2) Logistic regression

For next time you should:

1) Read Chapter 12 of Think Stats 2e

2) Read "Statistical inference is only mostly wrong" and (optionally) the discussion of the article on Reddit

3) Prep for a quiz.

Optional: Bayes's theorem and logistic regression

Quiz 5 debrief

Average = 8.5, lower than the norm (mostly 10ish)

Quiz trouble?  Do use the information in the signal, but don't panic.  They count for a small part of the final grade and I will drop the lowest 1-2 quizzes.

Q1 had a tricky part (negative slope means negative correlation)

Q2 and Q3 follow Section 10.5, if you want to review.

The moral is that R2 and rho can overstate the predictive power of a correlation.

Q4 was generally good.

Next week: no new book chapters!  Just work on the project and journal.

Logistic regression

Odds and probabilities are different representations of the same information. Given a probability, you can compute the odds like this:

    o = p / (1-p)

Given odds in favor, you can convert to probability like this:

    p = o / (o+1)

Logistic regression is based on the following model:

log(o) = β0 + β1 x1 + β2 x2 + ε 

Where o is the odds in favor of a particular outcome, and log is the natural log.

import statsmodels.formula.api as smf

model = smf.logit('boy ~ agepreg', data=df)

results = model.fit()

SummarizeResults(results)

Suppose you get results like this

Intercept   0.00579   (0.953)

agepreg     0.00105   (0.783)

R^2 6.144e-06

The p-value indicates that a coefficient as big as 0.00105 is quite likely to occur by chance, even if there is no relationship between mother's age and the sex of the baby.

But if we ignore that and take the model at face value, what is the probability of having a boy for a 35 year old mother?

Generalized linear models

Model depends on the type of the dependent/endogenous variable

Continuous: linear regression (ols)

Binary: logistic regression (logit)

Count: Poisson regression (poisson)

Categorical: Multinomial logistic regression (mnlogit)

Ordinal: ordinal regression (ologit -- not implemented)

Exercises:

1) Load chap11soln.ipynb.

2) Read through and run the examples.

3) Try out some different models.