Lecture 04

For today you should:

1) Read Chapter 4 of Think Stats 2e

2) Send an intro letter to your liaison

Today:

1) Chapters 2 and 3

2) Quiz

3) Professionalism

4) CDFs

For next time you should:

1)  Read Chapter 5 of Think Stats 2e.

2) Talk with your liaison, get data!  And forward the first reply from the liaison to me.

3) Finish chap04ex.ipynb

Exercises

Let's go over the exercises from Chapters 2 and 3.

Professional conduct

Internal: Maintain good communication with the instructors and your teammate.

External: Impress our collaborators with written and spoken communication skills.

Balance

1) Formality and collegiality.

2) Ask questions but do due diligence first.

3) Urgency and respect.

Due diligence means

1) Don't ask the question if you can answer it yourself.

2) Don't depend on me or the liaison to spell-check what you write.

Urgency means

1) Knowing when you are on the critical path and minimizing it.

2) Anticipating delays and hiding latency.

As you probably know, people love to hate on Millennials.  For the record, I think these criticisms are mostly misguided.  But the existence of these perceptions means that you have to make some effort to avoid reinforcing stereotypes.  Sorry.

Cumulative distribution functions

Mathematically, the cumulative sum of the PMF.  As a function, CDF(x) is the fraction of the sample less than or equal to x.

Python implementation is two sorted lists, so you can do a log-time lookup in either direction:

Uses of CDFs:

1) IMHO, the best way to visualize distributions during exploratory data analysis.

2) Well suited for comparing distributions.

3) Efficient for computing percentile-based statistics like median, IQR, etc.

4) Efficient for generating random numbers.

5) Useful for comparing individuals relative to different groups.

Are first babies lighter than others?

Data visualization is often a two step process:

1) Exploration, to figure out what the story is.

2) Presentation, to communicate the story clearly.

CDFs are excellent for (1).  Sometimes not great for (2), depending on the audience.

Let's do the Chapter 4 exercises.