Data Set 04

How do we handle bivariate data?

1) explanatory and response variables

you may know these as independent and dependent variables
one explains what's happening, the other responds to the explanation
examples:
- years and number of people on the earth
- number of calories eaten and weight
- speed a car is driven and miles per gallon

let's play with this for a bit: http://illuminations.nctm.org/Activity.aspx?id=4186

then let's try these: http://www.wilderdom.com/301/int/cor-guess.html

now like an 80's game: http://guessthecorrelation.com/

2) What we care about is how CLOSELY the response is tied to the explanation.

example 1: Year and M. Night Shyamalan movies (see the .csv attached below)

example 2: Pets Age Data (see the .csv attached below)

example 3: Old Faithful eruptions (see the .csv attached below)

3) Let's try to find a "line of best fit".

the official name for this is a "least squares regression line". If we think back to what we did for standard deviation it's going to help us a lot in considering this information.
the correlation coefficient lets us know how close we are--the higher the r value, the more our data fits a strong, linear correlation with the data.
the equation of this line attempts to tell us how the response variable is changed by the explanatory variable.

4) Does the line of best fit ... fit?

As we can see in some of our data, the line of best fit sometimes does a great job, and sometimes is not so good. How can we tell if the data follows the rule closely or whether it only kind of (or not at all follows)?

correlation coeffient: http://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Correlation_examples2.svg/800px-Correlation_examples2.svg.png
are there any influential points?
Is there a pattern in the residuals?

Causation vs. Correlation, and examples held within.

Influential Points.

Residual Plots and they can tell us about the world.

if there's a pattern in your residual, it may not matter how good your correlation is.