I am so happy to help all of you on each and every question. Seriously. This is our final piece, and I want to make sure you understand each and every facet. To that end:
1) Start by sharing your google document with me. Label it "final data set name name name"
2) When you as a group finish a question, send me a quick e-mail saying #3 done.
3) I will look at it and give feedback AS we are working through them. thursday is the only day this will not work.
For each problem in here, your goal is to select the BEST method. Things to consider for each question:
1) What method should I be using here? What methods help me with that?
2) How should I be looking for outliers? Should I in fact be looking for outliers?
3) What type of data is this, and how do I best account for that?
4) How am I going to get all of this done in time?! what a jerk that Mr. Mundt is.
Question One: Pupillary Distance
The listed sample distances (in millimeters) was obtained by using a pupilometer to measure the distances between the two pupils for a bunch of adults.
67 66 59 62 63 66 66 55 63 61 60 56 66 67 59 59 60 62 61
a) Give the following information:
xbar
s
n
df
t* or z* (whichever is appropriate) for 90% confidence interval
b) Give a 90% confidence interval for the distances between pupils for the population of all adults.
c) in what ways could you minimize the MOE (if you don't know what the MOE is, call me over. Have the formula you used in part b ready)
....One of the ways we could do that is to add more people, so that's what were going to do. Here are the pupil distances of more people. (INCLUDE THESE WITH THE NUMBERS WE ALREADY USED)
63 52 57 63 60 64 65 63 64 59 60 60 60 66 56 62 60 62 56 60 60 58 52 54
d) My science book here claims that the distance between pupils is 60 mm for the average adult. We want to see if that's different than the results we got.
e) Set and run either a z.test or a t.test, whichever you deem more appropriate.
f) using your results from part e, give me complete, coherent sentences explaining what we know based upon our information.
Question Two: Prison Terms
When 70 convicted embezzlers were randomly selected, the mean length of their prison sentence was found to be 22.1 months and the sd was 8.6 months (from the US DOJ). A governor is running on a platform to be tough on crime; she claims that prison terms for convicted embezzlers on average is under 2 years. Use your knowledge to try and evaluate her claim. Use alpha=0.01 for this question.
a) Run the appropriate tests for this to come to a conclusion. Give me all necessary information.
b) What would have happened if we had used an alpha of 0.05 instead of 0.01? What would have changed?
Question Three: Trees!
Length of leaves of tree A (in cm): 12 13 12 21 18 12 14 15 18 21 23 31 20 22 25 17 16
Length of leaves of tree B (in cm): 20 21 24 24 23 26 28 31 32 34 31 21 30 19 19
a) Create a visualization of this data that compares the two sets of data.
b) Do we think that leaf lengths are different in these two types of trees (at a significance level of alpha=0.05?)
Question Four: A slow and steady heart.
I claim that my pulse rate is slower than that of my students.
a) Set up the following: goal, null and alternate hypothesis, cautions, BASIC procedure.
b) Get some data!
c) run the needed and appropriate tests and write a brief paragraph explaining whether or not you believe my claim to be true.
Question Five: Monkeys and tones. (from the basic practice of statistics, fifth edition, Moore, David S.)
"The usual way yo study the brain's response to sounds is to have subjects listen to 'pure tones'. The response to recognizable sounds may differ. To compare responses, researchers anesthetized macaque monkeys. The fed pure tones and also monkey calls directly into their brains by inserting electrodes. Response to the stimulus was measured by the firing rate (electrical spikes per second) of neutrons in various areas of the brain. data is below. Researchers suspected that the response to monkey calls would be stronger than responses to a pure tone. So the data support this idea?
NOTE: THIS DATA IS MATCHED PAIRS. In other words, the 474 at the first point of tone matches the 500 in the first list of call. this is the same monkey and those are the responses. if you do not understand what I am saying here, or are unsure of what reference this has for us, let me know and I will talk you through it.
tone=c(474, 256, 241, 226, 185, 174, 176, 168, 161, 150, 145, 141, 129, 113, 112, 102, 100, 74, 72, 71, 68, 59, 57, 56, 47, 46, 41, 35, 31, 28, 26, 26, 21, 20, 20, 19,18)
call=c(500, 138, 485, 338, 194, 159, 341, 85, 303, 208, 42, 241, 194, 123, 182, 141, 118, 62, 112, 134, 65, 182, 97, 318, 201, 279, 62, 84, 103, 70, 192, 203, 135, 129, 193, 54, 66)
1) If there are any points you think need to me removed, please do so and explain why. Then create a plot and line of best fit for this data. Use tone as the explanatory variable and call as the response. Explain to me whether or not the data shows a strong correlation, and what that correlation means to us.
2) Perhaps a better way of dealing with this data is finding the difference between the call and tone data for each specific monkey (this is called a "matched pairs procedure"). First, make sure you remove any data considered removable in question one. Then create a new list of data that shows the difference between response to the call and the tone. (when done with this step you should have a single list of data. call it call.tone.diff or some such other creative thing).
3) Run an analysis the set of data you created in part 2. (z.test? t.test? what's alpha? what's df?).
4) Do you think that monkeys respond more to a monkey call as opposed to a tone? Back your assessment up with info from the previous three pieces of this question.
Question Six: Intersection of Math and Common Sense
There's an intersection in my town that is considered fairly dangerous. The police have been watching it for a year now, and have come up with the following numbers concerning the number of accidents per week:
accidents per week: 1,2,1,1,2,3,2,3,4,3,4,5,4,3,2,3,4,5,4,3,4,3,0,2,1,0,0,1,2,1,3,1,2,3,4,3,3,2,3,2,10,1,5,2,5,0,1,5,4,3,9,0
a) describe the shape of this data
b) find any suspected outliers. Do you think we should remove them? Explain why or why not, considering any possible reasons that the outliers might exist and what they do to our data.
c) what is the approximate probability that there are fewer than 125 accidents in a given year? EXPLAIN how you get this value, and it might help to rephrase this value in a way that is easier for your data.
Question Seven: Those Poor, Poor Newts
So here we go again, cutting newts like it's our job. Taken from a slightly different Newt experiment....
"Difference of electric potential occur naturally from point to point on a body's skin. Is the natural electric field's average strength the best for healing of skin? If so, changing the field on the skin of a newt would slow healing. The newts are anesthetized and a small cut is made on the back of both hind legs. One is left to heal naturally, the other has an electrode placed on it to change the electric field to half the normal. After two hours, we measure the healing rate (in micrometers per hour)."
The data is below in the excel file "newts".
a) run the appropriate tests on these data. Is the data significant at the 0.05 level?
b) give a 90% confidence interval for the difference in healing rates.
Question Eight: Leaf me alone!
You're sitting there, minding your own business when your friend comes up to you (let's say your friend's name is Suzie). Suzie knows you'e taking stats. She just got a bunch of data and wants you to analyze it. Suzie was trying to find out what factors might have an effect upon the ability of a plant to photosynthesize. She collected four pieces of information for each sample that they had:
Irradiance: the amount of light that was shining on the plant leaf.
C02 Concentration: how much C02 was in the air around the plant when the data was taken
Leaf Resistance: the resistance the leaf has to gases (how resistant the holes are that let air and water and gasses in and out)
Photosynthesis Rate: The rate at which the plant is currently photosynthesizing.
Suzie did not give you units. Suzie did not give you anything other than the data set. Suzie is not a very with it friend sometimes. However, you would like to help her.
a) Write a statistical summary for Suzie, outlining any possible outliers in each set of data, correlations between the data, and any other possible connections that you see within the different data sets.
b) Be sure your analysis includes neat and concise graphical data as well.
c) Show at least two different possible relationships. Try to consider what you would think would be the explanatory and what would be response. Give a line of best fit for any relationships where it would make sense, and show that the line of best fit does indeed make sense.
d) tell Suzie she owes you.
Question Nine: Hay there, good lookin'!
(data taken from Brase: Understanding Basic Statistics, 6th edition, p. 474. 2013)
We are interested to see if Hay fever rates are different in the populations of people over 50 and those under 25 in western Kansas. These rates were all sampled from random communities in western Kansas.
Over 50: 95 110 101 97 112 88 110 79 115 110 89 114 85 96
Under 25: 98 90 120 128 92 123 112 93 125 95 125 117 97 122 127 88
a) Follow your pattern. Do your thing. Report back.
b) What changes would occur if I had wanted to see if the rates of hay fever were LESS in people over 50? Run the test (it should be a super quick change) and report back any differences.
Question Ten: What/Where/When?!
For each thing, give a one to two sentence example explaining the difference:
a) z-tests and t-tests
b) one-tailed vs. two-tailed
c) normal data vs. skewed data
d) matched pair examples vs. two sample examples
Question Eleven: Right Arm Green! Left Leg Blue!
The last data set we will look at today is labeled tornadoes. It shows the number of tornadoes in the united states yearly since they started paying attention in 1953.
a. Give a 95% confidence interval for the expected number of tornadoes in a given year.
b. plot time and toradoes. find a correlation line. Do the number of tornadoes seem to be increasing? Give a couple of possible reasons for this.