All titles for the problems are not as funny as they should be--feel free to submit your own, better titles.
Problem: Some call him Tim
A marketing consultant (let's call them Tim) is curious how much the average shopper spends at the store. Tim goes to a store and observes 50 consecutive shoppers. Here are the amounts (in $) spent by these shoppers.
3.11 8.88 9.26 10.81 12.69 13.78 15.23 15.62 17.00 17.39 18.36 18.43 19.27 19.50 19.54 20.16 20.59 22.22 23.04 24.47 24.58 25.13 26.24 26.26 27.65 28.06 28.08 28.38 32.03 34.98 36.37 38.64 39.16 41.02 42.97 44.08 44.67 45.40 46.69 48.65 50.39 52.75 54.80 59.07 61.22 70.32 82.70 85.76 86.37 93.34
a) Tim is pleased with his work. But we are not. We are about to give a stern lecture to Tim about why his sample is probably not an SRS. Come up with at least three possible reasons why his sampling method is not good, and what he should do to fix it.
b) Make a stem plot of the data. It seems that perhaps we shouldn't do any z tests/confidence intervals/hypothesis testing with this data. Why?
Problem: Testing for Cooties
Cooties is a known, vicious, terrible disease that can only be cured with a vaccination known as a cootie shot. The cootie shot is a dangerous risk, and so we usually send students through a screening.
The screening process is 99% effective (i.e. 99% of all results will come back accurate).
a) Assume that a class of 23 students is run through the screening process. We know that none of them have cooties...but what are the chances that ALL will come back with the correct result?
b) Assume now the whole second grade goes through the procedure (120 people). We know THEY are all cootie free...what are the chances all will show the correct result?
Problem: Taken From Texas
I found this question as an example of hypothesis testing in a manual on methods from the auditors office in Texas. The manual can be found here (as well as the answer, so don't cheat too much). http://www.sao.state.tx.us/resources/Manuals/Method/data/19HYPTSD.pdf
A program manager asserts that a newly enacted Federal Government reporting requirement makes it impossible for her staff to process as many job applications as in previous years. She says that in previous years, her staff was able to process an average of 100 such applications per week. A random sample is taken of 15 weeks’ output during the period following enactment of the new reporting requirement. This sample yields the data listed below.
Note that n < 30 to keep calculations relatively simple:
93, 103, 95, 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95
Assuming that weekly output of applications is normally distributed, does sufficient evidence exist to conclude that staff productivity has statistically and significantly decreased following promulgation of the new regulation?
Problem: Those Poor, Poor Newts
So here we go again, cutting newts like it's our job. Taken from a slightly different Newt experiment....
"Difference of electric potential occur naturally from point to point on a body's skin. Is the natural electric field's average strength the best for healing of skin? If so, changing the field on the skin of a newt would slow healing. The newts are anesthetized and a small cut is made on the back of both hind legs. One is left to heal naturally, the other has an electrode placed on it to change the electric field to half the normal. After two hours, we measure the healing rate (in micrometers per hour)."
The data is below in the excel file "newts".
a) run the appropriate tests on these data. Is the data significant at the 0.05 level?
b) give a 90% confidence interval for the difference in healing rates.
Problem: Leaf me alone!
You're sitting there, minding your own business when your friend comes up to you (let's say your friend's name is Suzie). Suzie knows you'e taking stats. She just got a bunch of data and wants you to analyze it. Suzie was trying to find out what factors might have an effect upon the ability of a plant to photosynthesize. She collected four pieces of information for each sample that they had:
Irradiance: the amount of light that was shining on the plant leaf.
C02 Concentration: how much C02 was in the air around the plant when the data was taken
Leaf Resistance: the resistance the leaf has to gases (how resistant the holes are that let air and water and gasses in and out)
Photosynthesis Rate: The rate at which the plant is currently photosynthesizing.
Suzie did not give you units. Suzie did not give you anything other than the data set. Suzie is not a very with it friend sometimes. However, you would like to help her.
a) Write a statistical summary for Suzie, outlining any possible outliers in each set of data, correlations between the data, and any other possible connections that you see within the different data sets.
b) Be sure your analysis includes neat and concise graphical data as well.
c) Show at least two different possible relationships. Try to consider what you would think would be the explanatory and what would be response. Give a line of best fit for any relationships where it would make sense, and show that the line of best fit does indeed make sense.
d) tell Suzie she owes you.
Problem: Hay there, good lookin'!
(data taken from Brase: Understanding Basic Statistics, 6th edition, p. 474. 2013)
We are interested to see if Hay fever rates are different in the populations of people over 50 and those under 25 in western Kansas. These rates were all sampled from random communities in western Kansas.
Over 50: 95 110 101 97 112 88 110 79 115 110 89 114 85 96
Under 25: 98 90 120 128 92 123 112 93 125 95 125 117 97 122 127 88
a) Follow your pattern. Do your thing. Report back.
b) What changes would occur if I had wanted to see if the rates of hay fever were LESS in people over 50? Run the test (it should be a super quick change) and report back any differences.
Problem: What/Where/When?!
For each thing, give a one to two sentence example explaining the difference:
a) z-tests and t-tests
b) one-tailed vs. two-tailed
c) normal data vs. skewed data
d) matched pair examples vs. two sample examples
Problem: Right Arm Green! Left Leg Blue!
The last data set we will look at today is labeled tornadoes. It shows the number of tornadoes in the united states yearly since they started paying attention in 1953.
a. Give a 95% confidence interval for the expected number of tornadoes in a given year.
b. plot time and toradoes. find a correlation line. Do the number of tornadoes seem to be increasing? Give a couple of possible reasons for this.