Question One:
Assume IQ scores are N(107, 15) and are not biased towards any specific group of people.
a) Suzie scored a 133. What is her normalized score, and what percent of the population did she do better than?
b) Jon scored a 91. What is his normalized score and what percent of the population did he do better than?
c) What percent of the population scores between a 95 and a 113 on the IQ test?
d) In order to be join MENSA, you must score in the top 2% on the IQ test. What score must you achieve?
e) I want to see if a high school has an IQ that is above average. I want to select 30 students at random to test. Design a way for me to select these students that is as random as possible.
After doing part e, these are the scores that I get.
110, 112, 62, 116, 83, 98, 124, 92, 125, 126, 110, 134, 116, 81, 103, 89, 94, 124, 105, 97, 92, 112, 122, 125, 107, 115, 114, 109, 99, 108
f) What is my 95% confidence level for the mean IQ score for the school?
g) Can I say that the school has an above average IQ? Walk through all of the steps (Ho, Ha, alpha, check outliers, run analysis, analyze).
Question Two:
Hail damage. You work for an insurance company and are put in charge of determining rates for hail insurance--specifically wheat crops. Believe it or not, hail damage is a fairly large problem for crops--nationally, 11 percent of all wheat crops are destroyed by hail (I know, right?) You are going through some claims for a county in Colorado and get the following claims for percentage of destruction:
15, 8, 9, 11, 12, 20, 14, 11, 7, 10, 24, 20, 13, 9, 12, 5
Your job is to determine if there is more crop damage in this county than the normal average--if there is you are going to raise the rates for insurance for the following years. Assume at the national sd is 5 percent.
a. Give a null and alternative hypothesis.
b. Check for relative normality in the data before proceeding further. If there are any points that you feel you should take out, be VERY sure to explain why, as it may make people question your methods later on as you are reporting your findings.
c. In order to validate your raising rates, you need to be 99% confident that the damage here is more than the national average. Should you raise your rates? Explain.
Question Three:
I'm curious as to whether or not my 2 main e-mail addresses show any difference in usage. for the past month, I've saved all e-mail sent directly to me (ignorning spam). The two data sets are as follows:
Part One:
E-mail A: 7, 5, 21, 11, 0, 7, 19, 6, 5, 20, 2, 3, 12, 18, 12, 12, 17, 4, 5, 6, 12, 4, 1, 36, 2, 0, 6, 7
E-mail B: 10, 12, 12, 11, 1, 1, 4, 4, 6, 10, 6, 2, 7, 7, 8, 11, 9, 3, 5, 17, 9, 7, 4, 13, 7, 5, 4, 3
we're going to say that the standard deviation for the population of emails on a day is 5
for this question, don't worry about whether or not the data is skewed. It plays a minor role, but as we know, the means of a sample will tend towards the mean of a population regardless. Still, run the 1.5IQR test and see if there's anything severe that we should take out of either data set. If you think we should take something out, explain why.
Give a 95% CI for E-mail A.
Give a 95% CI for E-mail B.
Can we say that one of the E-mail addresses gets more e-mails than the other using the information we have here? Give one or two sentences explaining your position on the matter.
Why did we not use a null and alternate hypothesis for this question? In other words, why didn't we set the mean of e-mail A as the population mean, set a null hypothesis that e-mail A mean is the same as e-mail B's mean and run through with an alpha of 0.05?
Part Two:
Last year I averaged 15 e-mails a day. I want to know if that has changed this year. In order to do this, we are going to need to add the two data sets we used above to get the total number of e-mails received (for example, the first day in this sample, 7+10=17)
total=listA+listB should work in r, with the right names for your lists.
Give a Ho and a Ha for this question.
check for crazy outliers, again, either choosing to leave them in or not depending on your opinions. Back up either decision.
test at alpha=0.05
Give one or two sentences explaining your conclusion.
Question Four:
So . . . sun spots. We've looked for them for generations, and it has been believed that the mean number of sunspots in an average four week period is 41, with a standard deviation of 35. We've been looking at the numbers over the past four years (plus a little):
12.5 14.1 37.6 48.3 67.3 70.0 43.8 56.5 59.7 24.0 12.0 27.4 53.5 73.9 104.0 54.6 4.4 177.3 70.1 54.0 28.0 13.0 6.5 134.7 114 72.7 81.2 24.1 20.4 13.3 9.4 25.7 47.8 50.0 45.3 61.0 39.0 12.0 7.2 11.3
Why do we care? Well, it appears that when the average number of sunspots over a longish period of time is above the mean, it can produce times of general warming on the Earth. My question is: do we have enough evidence here to claim that we might be in for a time of warming on the Earth due to fluctuations in the sun? Give a null and alternative hypothesis, give an appropriate level of significance, etc. etc.
Question Five:
In the United States, the mean yield of corn has been 120 bushels per acre. This year, 40 farmers gave their yields, with a mean of 123.8. The average yield has sd of 10 bushels.
a. Assume we want to prove that this sample shows the mean is higher than 120 this year. Write a null and alternative hypothesis.
b. Can you conclude that the mean this year is higher than 120? Show your numbers and explain your reasoning.
Question Six:
If a random number generator truly is random,if it sampling from 0 to 100, it should tend towards having a mean of 50 and a population sd of 28.87.
Use r or another random number generator and create a list of 50 random numbers.
if stuck, try sample(0:100, replace=TRUE)
We added replace=TRUE to ensure the possibility that a single number can get picked more than once. This helps us check to see if the information is biased or not.
Use those numbers to determine whether or not you think the generator is truly random or not. Be sure to explain each step as you do it.