Lecture notes‎ > ‎

Lecture 05

For today you should have:
  1. Read Chapter 4.
  2. Prepared for a quiz.

  1. Quiz.
  2. Malcolm Gladwell and Canadian hockey.
  3. Pareto World.
  4. Think Stats presentation at Google.

For next time:

  1. Read Chapter 5.
  2. Do Homework 4.
  3. Follow the special instructions below.

Special instructions

1) Toss a fair coin.  Record the outcome in a secret location (For example,  you could have it tattooed on the bottom of your tongue.  Just a  suggestion.) 

2) If you got heads, please run this code 

import random 
print ''.join([random.choice('01') for i in range(100)]) 

And store the result in our Dropbox-shared folder in a file called random.txt 

3) If you got tails, please create a file called random.txt in our Dropbox-shared folder and type into it a sequence of 100 zeros and ones that is as random-looking as you can make it WITHOUT USING ANY RANDOM PROCESS other than your human-brain attempt to generate a seemingly random sequence.  No cheating! 

Either way, there should be exactly 100 characters in the file, all on one line.  You can use wc to check.

Next time we will see whether we can distinguish a "real" random sequence from a "fake" one. 

Distribution of birthdays

Robert Gordon "Bobby" OrrOC (born March 20, 1948)

From Malcolm Gladwell:

Are Gladwell's results statistically significant?

 I don't think, as a society, we are always particularly smart about how to make the best use of our talent. And if we're this bad at sports, imagine how bad we are at other things -- like getting the most out of young people's brains?

How might Gladwell's analysis apply to grade school?  Do you think it has an effect on enrollment at competitive colleges?  How could we test this hypothesis?

What bearing does this have on my ability to qualify for the Boston Marathon?

Pareto World

Exercise 4.4:

To get a feel for the Pareto distribution, imagine what the world would be like if the distribution of human height were Pareto. Choosing the parameters xm = 100 cm and α = 1.7, we get a distribution with a reasonable minimum, 100 cm, and median, 150 cm.

Generate 6 billion random values from this distribution. What is the mean of this sample? What fraction of the population is shorter than the mean? How tall is the tallest person in Pareto World?

Let's play with pareto_world.py

Topic to explore: Online algorithm, Streaming algorithm