Lecture 02
For today you should:
1) Read Chapter 2 of Think Stats 2e on NB.
2) Watch Jake Porway at TEDx
Today:
1) Chapter 1 discussion
2) Project descriptions
3) Chapter 2 discussion
For next time you should:
1) Read the project descriptions and fill in the survey. The deadline is Monday at 9am.
2) Read Chapter 3 of Think Stats 2e
Check out this AMA.
Quiz Tuesday on Chapters 1 through 3.
Project descriptions
Factors you might want to consider:
Known liaison?
Confidence in data?
Alignment of data with questions
Domain knowledge?
Stats beyond the scope of the class?
Potential for impact / social good
Venue for publication?
Chapter 2
Distribution: map from possible values to their probabilities.
"map" can mean a Python dictionary, other map type, function, or callable.
Histogram is a wrapper around a Python dictionary, maps from "values" to frequencies or counts.
thinkplot is a wrapper around matplotlib that knows about the classes in thinkstats2.
How would you describe these distributions?
Some of the characteristics we might want to report are:
central tendency: Do the values tend to cluster around a particular point?
modes: Is there more than one cluster?
spread: How much variability is there in the values?
tails: How quickly do the probabilities drop off as we move away from the modes?
outliers: Are there extreme values far from the modes?
Exercise: What's up with the bumps in the distribution of prglngth?
1) Print the value_counts for this variable and see if you can identify a pattern.
2) Find the documentation of this variable, and generate possible explanations.
Summary statistics
Obvious examples: mean and variance.
What's the difference between variance and standard deviation, and why would we prefer one or the other?
Effect size: a summary statistic intended to communicate the size of an effect: difference in mean, risk ratio, odds ratio.
Exercises:
1) Based on the results in this chapter, suppose you were asked to summarize what you learned about whether first babies arrive late.
Which summary statistics would you use if you wanted to get a story on the evening news? Which ones would you use if you wanted to reassure an anxious patient?
Finally, imagine that you are Cecil Adams, author of The Straight Dope, and your job is to answer the question, “Do first babies arrive late?” Write a paragraph that uses the results in this chapter to answer the question clearly, precisely, and honestly.
2) In the repository you downloaded, you should find a file named chap02ex.ipynb. Make a copy called chap02mine.ipynb, and open it.
Some cells are already filled in, and you should execute them. Other cells give you instructions for exercises. Follow the instructions and fill in the answers.
3) Using the variable totalwgt_lb, investigate whether first babies are lighter or heavier than others. Compute Cohen’s d to quantify the difference between the groups. How does it compare to the difference in pregnancy length?