goal: create normal curves and use testing for z-scores
use data sets to set amounts of standard deviation that are different
understand difference between spread and shape
In class:
zscore<-function(x,mean,sd){
zsc=(x-mean)/sd
print(zsc)
}
I made the above function in r to give us the z-scores for a value or list of values given the mean and the standard deviation.
We're going to work with a math mystery to try and make our first program together. Here's the Magic Trick:
choose a number. double your number. now add ten. now cut your number in half. now subtract five. YOU'RE BACK WHERE YOU STARTED FROM!
we want to build a program that will allow us to mimic this process. We walso want output along the way allowing us to see what numbers occurred on the journey.
Now.
Question One:
Your job is to build code called standard.dev(). This should take a list of numbers and find the standard deviation. It should not use the call sd() anywhere in it. That's cheating--or being creative with your available resources. However you want to put it. Consider each step we took both in coding and in the standard deviation.
I suggest writing it somewhere that's not in r to start, then copy/pasting it in there to test it out.
When you have a finished copy, paste that code into your answer for question number one.
Question Two:
The goal of this question is to create a normal quantile plot in order to aid us in checking data for normality.
A normal quantile plot shows their placement in the numbers represented as the x-axis and the ACTUAL values as the y-axis (see page 66 of your book).
Below are three data sets: the calories provided in three different types of hot dogs.
Beef: 186 181 176 149 184 190 158 139 175 148 152 111 141 153 190 157 131 149 135 132
Meat: 173 191 182 190 172 147 146 139 175 136 179 153 107 195 135 140 138
Poultry: 129 132 102 106 94 102 87 99 170 113 135 142 86 143 152 146
b) Create a graphic that allows you to compare all three types of hot dogs on one graph (include some type of legend or way to know the differences). Compare and contrast using this as well as a numerical print out.
c) The normal quantile plot is made as follows:
qqnorm(data)
Make a normal quantile plot for all of the hotdog data together (so, make a list of all the hot dogs together, and then: qqnorm(alldogs)
d) Describe the normal quantile plot we made in part c.
According to the book, if the data is fairly normal, we will end up with a fairly straight line. this happens because we do not have any severe skew causing the data to curve. We may still end up with outliers, but those will appear in the straight line, but separated from the rest.
As a note, FAIRLY normal means that there can be a little bend/leniency in the data. It doesn't have to be a rigidly straight line.
Does the hot dog data combined appear to be normal?
e) Use the guinea pig data from before. We had a conversation trying to decide if the data was normal with outliers or if it was skewed to the right with outliers. Use the skills we have learned in this lesson to help evaluate the data a little bit better (i.e. make a normal quantile plot and use it to help you determine whether the data is skewed or normal).
guinea pigs: 43, 45, 53, 56, 56, 57, 58, 66, 67, 73, 74, 79, 80, 80, 81, 81, 81, 82, 83, 83, 84, 84, 88, 89, 91, 91, 92, 92, 97, 99, 99, 100, 101, 102, 102, 102, 103, 104, 107, 108, 109, 113, 114, 118, 121, 123, 126, 128, 137, 138, 139, 144, 145, 147, 156, 162, 174, 178, 179, 184, 191, 198, 211, 214, 243, 249, 329, 380, 403, 511, 522, 598
f) Here is a list of the number of graduate students per 1000 people for each of the fifty states:
grads=c(37,41,41,41,41,41,43,43,45,45,45,45,46,46,46,47,47,47,47,47,48, 48,48,48,49,50,50,51,51,51,52,52,52,54,54,54,54,55,56,56,60,60,60,60, 61,67,69,71,72,77)
Are there any outliers in this data? Use your knowledge gained from this lab and previous labs to decide. Show appropriate graphs and information.
Question 2:
Here is a way to make a normal plot in r:
plot(curve(dnorm(x,mean=0,sd=4)),xlim=c(-10,10),ylim=c(0,0.5),type="l")
As often appears to be the case, there's a little quirk on some computers--the first time it graphs this, it causes a little blip to appear on a graph but nothing else. Fear not; press up, try the command again and 80% of the time it works every time.
When it doesn't work 20% of the time every time, then remake the curve like thus:
curve(dnorm(x,mean=0,sd=4),add=TRUE,type="l",col="black")
This should make the curve we were trying for in the first place.
The only REAL new thing we've added to this is 'type'. That's a lowercase l and not the number 1. This caused me some frustration in the past.
To add a curve to an already made plot:
curve(dnorm(x,mean=0,sd=2),add=TRUE,type="l",col="red")
a) On a new single graph, plot three normal curves; A, B, and C. A and B should have the same sd but different means. B and C should have the same mean, but different sds. Compare and contrast the three graphs, their relative shapes and overall appearances.
EVERYTHING ABOVE THIS IS DUE FRIDAY.
Extension: EVERYTHING BELOW THIS IS DUE MONDAY.
Problems from the book to answer: 1.122, 1.132-1.141, 1.156. (these should be turned in with the lab, though you do not have to give me all of your work; for example "A score of 1080 on the SAT would have a Z-score of 1.45 and a p of .8823" is a great answer)
read chapter 3. We will be discussing sections 3.1, 3.2, and 3.3 in class. Section 3.4 is needed, you should read and think about it, but we will not be putting as much time into it (not sure if that's ethical or not).
Problems to have done for class: 3.52, 3.53, 3.59 (describe the method you used), 3.76, 3.77, 3.84,
DO NOT DO ANYTHING BELOW THIS SENTENCE. THX.
Below is a .csv of the top 350 baseball players of 2009. I found this using a data dump online. In this data there ae many different sets or comparisons you could make. Your goal is to find one set of data that is fairly normal, and one set of data that is not normal. Analyze them using the different methods we know so far and determine if there is anything strange in the data.
Finally, use the data to make something interesting. I'm leaving that up to you.