Lecture 15

For today you should:

1) Read "Avoiding a common mistake with time series"

2) Install Bokeh

3) Work on completing and cleaning up your journal entries.  Hint: make a checklist.

Today:

1) Quiz solution

2) Bokeh

For next time you should:

1) Finish cleaning up your journal.

We'll have a quiz next time with some numpy/pandas/thinkstats2 practice.

Journal prep

To prepare your journal, you should review existing sections to make sure they are still current.

If you left things half done, now is the time to do the other half.

Please give some thought to the Data Management section, especially the legal and ethical issues.  Think not just about the current project, but what issues might arise if your work continued and scaled up.

If you really think there are no legal or ethical issues, explain why not.  Thoughtfully.  If you blow this off, I will be annoyed.

You should check three things:

0) I gave you instructions for naming your journal and meeting notes.  Some of you did not follow those instructions, which means that instead of spending my time making this class better for everyone, I spend it trying to find and organize documents.

Please check the names of your documents.  If the name of your journal is "Journal", you fail.

1) Are all required elements present and easy to identify?

2) For each required element, is it easy to find and understand

Q: What's the motivating question?

M: What methodology is used?

R: What's the result?

I: How do we interpret the result as an answer to the question?

Remember the curse of knowledge.  You know what you are doing and why; the reader does not.

3) Are you remembering the basics of presenting quantitative information?

4) Are all the words spelled right?  Are they arranged grammatically?

I will have more to say about style later.

Quiz review

1) Something like this

filler = Cdf(df.height.dropna()).Sample(len(df))

df.height.fillna(filler, inplace=True)

2) Children who initially had no sign of peanut sensitivity were assigned to two groups: in the group that avoided peanuts, 13.7% developed peanut allergies; in the group that ate peanuts, only 1.9% developed allergies.

If you think of eating peanuts as the treatment and avoiding peanuts as the control, the effect of the treatment is a reduction of 11.8 percentage points or 86%.  The relative risk is 0.139 and the odds ratio is 0.122.  The (natural) log odds ratio is -2.1, or -9.1 dB.

If you think of avoiding peanuts as the treatment, the effect is an increase of 11.8 percentage points or 621%.  The relative risk is 7.2, and the odds ratio is 8.2.  The (natural) log odds ratio is 2.1, or 9.1dB.

3) Effect size 23 dB corresponds to odds of 199.5 to 1 in favor, which corresponds to probability 99.5%.

4) The model predicts

log o = -0.0301 - 0.0224 - 0.00267 (25) + 0.00501 = -0.1142

o = 0.8920

p = 0.4715

Bokeh

If you run

conda install bokeh

cd ~/anaconda/Examples/bokeh

ipython notebook

You should be able to view the examples that come with Bokeh.

For some of the examples, you have to run the Bokeh server.  Anaconda puts the script that launches the server in your search path, so you can run

bokeh-server

in another window.

Some examples need sample data that you can download by running this code in an IPython cell:

import bokeh.sampledata

bokeh.sampledata.download()