Homework

Homework 1

0) Read Chapter 1 of Think Stats.

1) If you don't already have a Dropbox account, create one by following this link.  Inside your Dropbox directory, create a directory for this class.  The name of the directory should be your name in the format first.last (all lower case). Invite downey@allendowney.com to share it.  You will turn in your work by copying it into this directory.

Please don't work in the shared directory, and please don't put anything big in this directory; if you do I might have to delete it.

2) Choose a Python development environment that works for you.  I recommend any of the following:
For this class you can use Python 2.5 or later, but not Python 3.  Create a directory or workspace where you will put the code you write this semester.  For simplicity, you probably want to put all code and data files in the same directory.  Write a short Python program and confirm that you can run it.

3) The file heptathlon.csv contains a dataset from HSAUR.  A description of the dataset is available here.  Download it into your code directory.

4) Create a file named hw1.py.  Write a function that reads the data file and builds an appropriate data structure.  If you are comfortable with objects, make an object for each heptathlete.  Otherwise, a list of lists or list of dictionaries would be fine.  Hint: you might want to use the csv module.  Also, you might want to read Chapter 14 of Think Python.

5) Write a function that takes the data structure you just built and extracts a list of results for the 800m event.

6) Write a function that converts the list of 800m results to a float in miles per hour.

7) Write a function that takes a list of numbers and computes their mean and variance.  Compute the mean and standard deviation of the heptathletes' speed in MPH.  Check that the result is consistent with common sense.

8) Add comments at the top of your file that include your name and the results you computed.  Add docstrings to each of your functions.

What to turn in: You should copy hw1.py into the top level of our shared directory along with a copy of heptathlon.csv.  Note that all file and directory names are case sensitive.

I should be able to cd into our shared directory and run

python hw1.py

The program should print the mean and standard deviation of the heptathlete's speeds (with units).

Homework 2

0) Read Chapter 2 of Think Stats.

1) Exercise 1.2: Download the NSFG data and run survey.py.

2) Exercise 1.3: Don't do the exercise; just read it, download first.py and run it.

3) Download Pmf.py and read through it.  Read the documentation at Pmf.html.  We will be using Pmf for the rest of the semester, so you should learn the interface and understand the implementation.

4) Create a file named hw2.py and put your solution to the following exercises in it.
  • 2.1 (pumpkins)
  • 2.2 (standard deviation of pregnancy length)
  • 2.3 (mode)
  • 2.4 (remaining lifetime)
What to turn in: You should put hw2.py in the top level of our shared directory along with the downloaded copies of survey.py, thinkstats.py and Pmf.py.  Note that all file and directory names are case sensitive.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.

Do not include a copy of the NSFG files!  Instead, hw2.py should take a data directory as a command-line argument, just like survey.py and first.py.  I should be able to cd into our shared directory and run

python hw2.py [data_dir]

The program should print the results from Exercises 2.1 through 2.5.

Homework 3

0) Read Chapter 3 of Think Stats.

1) Download Cdf.py and read through it.  Read the documentation at Cdf.html.  We will be using Cdf for the rest of the semester, so you should learn the interfaces and understand the implementation.

2) Exercise 3.5: compare PMF and CDF for the same distribution.

3) Exercise 3.6: Look up the percentile rank of your birth weight.

4) Exercise 3.8: use percentile ranks to compare people in different distributions. No code required.

5) Exercise 3.11: quartiles of birthweight.

What to turn in: Put your code in a file named hw3.py.  When I run it, it should print your birth weight and percentile rank, quartiles of the birth weight distribution.   Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.

Homework 4

0) Read Chapter 4 of Think Stats (Continuous distributions).

1) Exercise 4.3: Generate a sample from a Pareto distribution and plot the CCDF.

2) Exercise 4.7: Tails of the IQ distribution.

3) Exercise 4.11: 
Generate normal probability plots for w and logw.

4) Exercise 4.12: See if the distribution of population sizes fits a continuous distribution.

What to turn in: Put your code in a file named hw4.py.  Also include the resulting figures in files named pareto.eps, weight.eps, logweight.eps and populations.eps.   Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.

Do not include a copy of the BRFSS data file!  Instead, hw4.py should take a data directory as a command-line argument, just like brfss.py.  I should be able to cd into our shared directory and run

python hw4.py [data_dir]



Homework 5

0) Read Chapter 5 of Think Stats.

1) Exercise 5.1: 
If I roll two dice and the total is 8, what is the chance that one of the dice is a 6?

2) Exercise 5.2: 100 dice.

3) Exercise 5.6: Poincare (optional)

4) Exercises 5.8: two dice, one 6. 

5) Exercises 5.9: A or B but not both.

6) Exercise 5.14: False positive drug tests.  Sensitivity = 60%, False positive = 1%, actual use rate = 1%

What to turn in: Put all code for this homework in a file named hw5.py in the top level of our shared directory.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.


Homework 6

0) Read Chapter 6 of Think Stats.

1) Exercise 6.9: Write a function that takes two Pmf objects and computes the PMF of Z = X + Y.

2) Write a function that computes the Pmf of Z = max(X, Y)

3) Exercise 6.11: choose two of {normal, exponential, lognormal, Pareto} with replacement, and see what the distribution of the sum looks like.

4) Exercise 6.13: choose one of {exponential, lognormal, Pareto} and see what happens as you add up larger samples.  The goal of this exercise is to see whether the Central Limit Theorem works for these distributions.

What to turn in: Put all code for this homework in a file named hw6.py in the top level of our shared directory.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.

Homework 7

0) Read Chapter 7 of Think Stats.

1) Exercise 7.1: compute the p-value for the difference in birth weights.

2) Exercise 7.4: updating Bayesian probabilities using likelihood ratios.

3) Exercise 7.6: (optional) Do it or read about the solution at Probably Overthinking It.

What to turn in: Put all code for this homework in a file named hw7.py in the top level of our shared directory.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.



Homework 8

0) Read Chapter 8 of Think Stats.

1) Exercise 8.1: check whether using the sample mean to estimate mu yields a lower MSE than using the median.

Let me repeat the warning in the book: when we are doing a real estimation problem, we can't compute errors because we don't know the true value.  So this is an artificial exercise we are doing to test the performance of estimation methods.

2) Exercise 8.2: check whether S2 is a biased estimator of sigma2 (optional).

3) Exercise 8.4: compute the posterior distribution of lambda.

4) Exercise 8.6: generalize the locomotive problem (optional).

What to turn in: Put all code for this homework in a file named hw8.py in the top level of our shared directory.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.


Homework 9

0) Read Chapter 9 of Think Stats.

1) Exercise 9.4: Scatterplots of weight vs. height and log(weight) vs height.  Also correlation and rank correlation.

2) Exercise 9.6: Linear least squares fit for log(weight) vs. height.

3) Exercise 9.8: Using SAT scores to predict IQ.

4) Exercise 9.10: Using height to predict weight (optional).

What to turn in: Put all code for this homework in a file named hw9.py in the top level of our shared directory.  Add comments at the top of your file that include your name, results you computed, and answers to any non-code questions.  Add docstrings to each of your functions.

Č
ċ
heptathlon.csv
(2k)
Allen Downey,
Oct 18, 2010, 10:29 AM
Comments