1: Old Faithful
Below is a data set for Old Faithful eruptions. The first data column is the duration of the eruption (in seconds). The second column is the Interval of Time before the next eruption (in minutes). We are going to use duration as the explanatory and use interval as the response variable.
Get a correlation between our data, find a line of best fit, and plot your data (and line).
Does a linear relationship make sense for this data? Explain any possible problems with using a linear relationship with this data.
If an eruption were to last for 256 seconds, what would the expected Interval time be?
If an Interval was 88 minutes, what would we expect the eruption time to be?
One of the points given was (207, 78). What is the residual of this point?
2: Anscombe's Quartet:
This data set is a set made up by a statistician. In the data set (found below), there are four groups of data. Please follow the order of this data set as it should lead you to an interesting conclusion. There are four data sets. xA and yA go together, xB and yB go together, xC and yC go together, and xD and yD go together.
Find the correlation for each of the four data sets.
Find the line of best fit for each of the four data sets. make sure to write it in the new form (y=b0+b1*x)
Create a graph for each of the data sets, making sure to include the line of best fit for each. These four graphs should be on the same page and should all be viewable at the same time.
Compare and contrast how the line of best fit interacts with the data.
Which data set does the line make the most sense for?
For each data set that a line of best fit doesn't make sense, explain why.
3: Detroit
Below is a data set dealing with statistics from Detroit. The explanation of the different pieces can be found here.
load the csv file into r. Save the data as detroit
use: plot(detroit) . Explain what r gave you in this graph. If you ae unsure, call me over and we will look at it together.
Find two variables you think would be described well with a line of best fit. find the correlation between your two variables, find the line of best fit, and then create a plot that includes the points, correct axes labels, and the line of best fit.
Explain what relationship you found and what it means in real life.
4: A friend of yours has been doing some research in the area of photo synthesis. They have collected the information and placed it into a .csv for you. They explain the four different variables as such:
Irradiance: the amount of light that was shining on the plant leaf.
C02 Concentration: how much C02 was in the air around the plant when the data was taken
Leaf Resistance: the resistance the leaf has to gases (how resistant the holes are that let air and water and gasses in and out)
Photosynthesis Rate: The rate at which the plant is currently photosynthesizing.
They ask you to help them answer the following question: "Which of the three factors (Irradiance, CO2 Concentration and Leaf Resistance) seem to have the strongest correlation with the photosynthesis rate?"
Using your knowledge of statistics, help them answer that question. I'm leaving it up to you to show what you need to, make the proper graphs and equations. But give me enough so that I will be convinced.
5: pets!
Given in the .csv below are how old pets are in human years (for example, after 4 years it is like a dog is 34). Your job is to make a graphic that has both cat years and dog years on the same graphic. Be sure to include lines of best fit for both.