E. Predict It‎ > ‎

### 1. Correlation vs. Causation

Take the following survey.  Tomorrow, we will find our movie-watching buddies based on preference correlations: Movie survey

In class, we will watch the Anne Milgrim TED Talk (below) on statistics used to make predictions in the criminal justice system.  On the next day, we will watch the Nate Silver TEDx Talk on race and voting habits.  While we watch, you will go to https://todaysmeet.com/statsted.  Use your name so I can give you credit for your posts.  Everyone is expected to have at least:
• [1 pt/day] 1 unique observation (something interesting/surprising)
• [1 pt/day] 1 thoughtful responses to a classmate (use @Mary or @MrP to target a comment at a given recipient)
Make your observation during the video.  You will have time to think through comments and reply at the end of the video.

Appropriate Netiquette:
When communicating online in a professional / academic forum, expected behaviors are different than standard social media use.  This is similar to the difference you notice in language and style when giving a speech vs. talking to friends.  Appropriate behavior depends more on context than the tool / medium used
• Always be conscious that there is a person, specifically your classmate, on the other end of an online conversation
• Use school-appropriate grammar, punctuation, and language.  Avoid digital SCREAMING.
• Stay on-topic.
• Be specific so others can understand you.

#### Day 2: Nate Silver, Race and Voting

Mastery Quiz Prep

#### Review of causation and correlation that we covered in class

For problems 1-5, answer the following:
a) List the 2 variables and whether they are categorical or quantitative.
b) Which section would you use in StatKey to create a chart / graph?
c) Which variable is likely the cause and which is likely the response?  If neither, what might a lurking variable be that connects these two?  Which input leads to which output?

1. Premium gasoline (89 octane) gives cars better gas mileage than regular gasoline (87 octane).
2. The weekly grocery bill is associated with the number of family members.
3. Taking a recently developed pill each day will reduce the number of headaches experienced over the next 3 months compared to another brand.
4. Professional sports team’s winning percentage is associated with the team’s average salary.
5. A classroom poll asked students if they liked math or not based on what class they were enrolled in

6. Explain the difference between independence, dependence, and causation.  How can you prove causation?
7. Explain the difference between independent variables, dependent variables, and lurking variables

Free Response Prep

The goal of analyzing the relationship between two variables is different than the goal when working with only one variable at a time.  Explain.
See first video.  In 1-variable analysis, you are searching for a summary of the current state of the situation.  In 2-variable analysis, you are trying to find a link between the variables.  You want to know if one variable can predict the other.

What is the difference between correlation and causation?  What is needed to prove causation?
See last video.  Correlation is a link / dependent relationship.  Causation means that one of the variables is the reason for the other.  The best way to prove a cause is an experiment because it eliminates all of the lurking variables.  Outside of the statistics realm, sometimes you can prove causation with a very strong understanding of the mechanism behind how something works.

Why is it so incredibly useful in nearly every job to identify dependent relationships in data?  Give an example of dependent variables and explain why it knowing this relationship helps somebody do their job better.
See in-class TED Talks.  If you know an end result is predictable, you can use the predictors to change behavior before the end result happens.  Anne Milgrim did this for judges with the criminal risk factors.  Nate Silver is starting to do this with racism based on city design.

Practice solutions
1. a) gas type (categorical) vs. gas mileage (quantitative)
b) use "one quantitative and one categorical"
c) type of gas should cause gas mileage to change, and premium gas should result in higher mileage

2. a) grocery bill (quantitative) vs. family size (quantitative)
b) use "two quantitative variables"
c) family size causes grocery bill, and larger family = larger bill

3. a) pill type (categorical) vs. number of headaches (quantitative)
b) use "one quantitative and one categorical"
c) pill type should cause the number of headaches, and the new pill should result in fewer headaches

4. a) winning percentage of team (quantitative) vs. average salary of team (quantitative)
b) use "two quantitative variables"
c) either method of causation- higher salary can cause higher win percentage→by buying better players with higher salaries OR a better win percentage can heighten salaries→if they are rewarded for playing well

5. a) attitude towards math (categorical) vs. class enrolled in (categorical)
b) use "two categorical variables"
c) this one is tricky -- I would actually bet that math ability is the lurking variable that determines which class they are in and their attitude towards math

6. Independence: when two variables to NOT affect each other
Dependence: when two variables DO affect each other
Causation: when one variable is the CAUSE behind the other variable changing, not just a coincidence or some lurking variable that affects both outcomes.  You can only prove that something is the cause of something else by making sure every other variable is identical, testing the two versions, and comparing the results (a randomized, controlled experiment -- the subject of our next unit).

7. Independent variable is the variable that is assumed to be the main cause, usually not affected by the other variable.  An example is time (not affected by many things unless you're talking about theoretical physics).
Dependent variable is the second variable whose result is driven by the independent variable
A lurking variable is not one of the variables being analyzed, but some other factor that is causing both of the looked-at variables to change together.  It "explains" both variables under study but "lurks" in the background.

Notes

Ċ
Andy Pethan,
Nov 10, 2014, 8:44 AM