1. Hail damage. You work for an insurance company and are put in charge of determining rates for hail insurance--specifically wheat crops. Believe it or not, hail damage is a fairly large problem for crops--nationally, 11% of all wheat crops are destroyed by hail (I know, right?) You are going through some claims for a county in Colorado and get the following claims for percentage of destruction:
15, 8, 9, 11, 12, 20, 14, 11, 7, 10, 24, 20, 13, 9, 12, 5
Your job is to determine if there is more crop damage in this county than the normal average--if there is you are going to raise the rates for insurance for the following years. Assume at the national sd is 5%.
a. Give a null and alternative hypothesis.
b. Check for relative normality in the data before proceeding further. If there are any points that you feel you should take out, be VERY sure to explain why, as it may make people question your methods later on as you are reporting your findings.
c. In order to validate your raising rates, you need to be 99% confident that the damage here is more than the national average. Should you raise your rates? Explain.
2.
I'm curious as to whether or not my 2 main e-mail addresses show any difference in usage. for the past month, I've saved all e-mail sent directly to me (ignorning spam). The two data sets are as follows:
E-mail A: 7, 5, 21, 11, 0, 7, 19, 6, 5, 20, 2, 3, 12, 18, 12, 12, 17, 4, 5, 6, 12, 4, 1, 36, 2, 0, 6, 7
E-mail B: 10, 12, 12, 11, 1, 1, 4, 4, 6, 10, 6, 2, 7, 7, 8, 11, 9, 3, 5, 17, 9, 7, 4, 13, 7, 5, 4, 3
we're going to say that the standard deviation for the population is 7
for this question, don't worry about whether or not the data is skewed. It plays a minor role, but as we know, the means of a sample will tend towards the mean of a population regardless. Still, run the 1.5IQR test and see if there's anything severe that we should take out of either data set. If you think we should take something out, explain why.
Give a 95% CI for E-mail A.
Give a 95% CI for E-mail B.
Can we say that one of the E-mail addresses gets more e-mails than the other using the information we have here? Give one or two sentences explaining your position on the matter.
Why did we not use a null and alternate hypothesis for this question? In other words, why didn't we set the mean of e-mail A as the population mean, set a null hypothesis that e-mail A mean is the same as e-mail B's mean and run through with an alpha of 0.05?
3. Last year I averaged 15 e-mails a day. I want to know if that has changed this year. In order to do this, we are going to need to add the two data sets we used above (for example, the first day in this sample, 7+10=17
total=listA+listB should work in r, with the right names for your lists.
Give a Ho and a Ha for this question.
check for crazy outliers, again, either choosing to leave them in or not depending on your opinions. Back up either decision.
test at alpha=0.05
Give one or two sentences explaining your conclusion.
4.
So . . . sun spots. We've looked for them for generations, and it has been believed that the mean number of sunspots in an average four week period is 41, with a standard deviation of 35. We've been looking at the numbers over the past four years (plus a little):
12.5 14.1 37.6 48.3 67.3 70.0 43.8 56.5 59.7 24.0 12.0 27.4 53.5 73.9 104.0 54.6 4.4 177.3 70.1 54.0 28.0 13.0 6.5 134.7 114 72.7 81.2 24.1 20.4 13.3 9.4 25.7 47.8 50.0 45.3 61.0 39.0 12.0 7.2 11.3
Why do we care? Well, it appears that when the average number of sunspots over a longish period of time is above the mean, it can produce times of general warming on the Earth. My question is: do we have enough evidence here to claim that we might be in for a time of warming on the Earth due to fluctuations in the sun? Give a null and alternative hypothesis, give an appropriate level of significance, etc. etc.
5.
Here are the IQ test scores of 31 seventh-grade girls in a Midwest school district. This is an SRS over the whole district. The sd of IQ scores is known at 15 for this population.
114 100 104 89 102 91 114 114 103 105 108 130 120 132 111 128 118 119 86 72 111 103 74 112 107 103 98 96 112 112 93
We used these last week and got a 99.7% confidence interval. This week, I am curious as to whether these girls could be considered average. State a Ho, a Ha, and go through the proceedings to determine whether or not these girls should be considered average in terms of IQ. (in case you are unaware, 100 is considered average)
6.
In the United States, the mean yield of corn has been 120 bushels per acre. This year, 40 farmers gave their yields, with a mean of 123.8. The average yield has sd of 10 bushels.
a. Assume we want to prove that this sample shows the mean is higher than 120 this year. Write a null and alternative hypothesis.
b. Can you conclude that the mean this year is higher than 120? Show your numbers and explain your reasoning.
EXTENSION DUE TO ME BY MONDAY:
1.
If a random number generator truly is random,if it sampling from 0 to 100, it should tend towards having a mean of 50 and a population sd of 28.87.
Use r or another random number generator and create a list of 50 random numbers.
if stuck, try sample(0:100, replace=TRUE)
We added replace=TRUE to ensure the possibility that a single number can get picked more than once. This helps us check to see if the information is biased or not.
Use those numbers to determine whether or not you think the generator is truly random or not. Be sure to explain each step as you do it.
What number shows up the most in your data set (let me know how you found this out--there's r code or you can count, curious which you use)
Each number has a 0.01 chance of being selected. run a binomial test in r and see what the chances were of your specific number being selected the number of times that it was.
I'm going to guess the chances were pretty small. Does that fact mean that your data is exceptional? Explain how you know.
2.
Questions from the book (these are to be turned in on monday via e-mail, in case there was any doubt):
6.68, 6.71, 6.123 (part a they give you the answer, but go with it anyway)
read section 7.1 and prepare the following questions for in class:
what is the difference between a t score and a z score?
how do we know what test we should be using in a given instance? (t.test? z.test? CI? 1.5IQR? arg!)
we are going to attempt to make a diagram to help us in class dealing with this exact concept. so be prepared and ready to participate monday.