1. Go over the hw questions
2a: I am 99% confident that the average study time for a freshman in college over a weekend is between 126.8 and 147.2 minutes.
2b: I am 99% confident that the average study time for a freshman in college over a weekend is between 237.8 and 258.2 minutes.
Question One big point -- how do we check our data for outliers?
is it normal? use mean +/- 2 sd
is it skewed? used IQR +/- 1.5 IQR
remember that this is for POSSIBLE outliers -- just because something is outside of it doesn't make it one
check for: distance from another point, sampling errors, etc.
http://pareonline.net/getvn.asp?v=9&n=6 <-- fascinating reading.
note, they suggest a z=3, rather than z=2
when reading other people's work, pay special attention to any outliers or data that they removed. They should tell you why they removed it, what they did with it, and their reasoning.
3a: I expect the mean of the average to be at the same place as the average SAT score for the test: 515.
3b: I expect the standard error (or the sd of the samples) to be the standard deviation of the sample divided by the size of the sample.
So, 114/sqrt(100) = 11.4
3c: the range I get from 530 goes from 507.7 to 552.3, so the true mean is included in our sample.
3d: 515 + 1.96*11.4 = 537.34. Any score above this would not in fact get our true mean with it.
4. I took out the low outlier from the class, and the data becomes much more normal and reasonable to use. Because I am looking for a low value, I am okay getting rid of the lowest one when comparing this to other courses. Once I do this: I am 95% confident that this classes true mean for this test is between 80.9 and 85.1. Baed on this, and comparing it to the true mean, I cannot say that this class is not as bright as the other classes.
Question 5:
40000 miles? pnorm(40000,32000,2500) -- 0.9993129. 99.9% of tires fail before this point. If I got these tires I would be super lucky.
30000 miles? pnorm(30000,32000,2500) -- 0.2118554. 21.2% of tires fail before this mark. Not ideal, but 1/5 times, it'll happen.
30000 - 350000? pnorm(35000,32000,2500) - pnorm(30000,32000,2500) -- 0.6730749. Most tires are in this range.
qnorm(.75,32000,2500) --33686.22
qnorm(.25,32000,2500) --30313.78
The middle 50% of tires will fail somewhere between 30314 and 33686 miles.
2. Here are your quotes from the sentences you sent me -- what makes sense for us?
The Confidence Interval: The range which represents a small but random sample of a population.
Confidence interval = X + - Z (SD/√n)
Confidence interval means the boundary (plus or minus a specific point) within which our outcomes may lie. E.g. 95% confidence interval means that 95% of our data lies in a specific range.
To be closer to the truth, we need to find the margin of error yet it is important to be confident.
The Central Limit Theorem allows us to mess around with its variables in order to gauge their impact on the equation overall.
In order to calculate the Central Limit Theorem, we need to know the mean, the standard deviation, the sample size and the margin of error.
The Central Limit Theorem enables us to determine how far/close the sample mean is from the actual mean.
As a result of diminishing returns on accuracy as sample size increases, even a seemingly small sample can, with fairly high certainty, be representative of the larger population.
The Central Limit Theorem basically enables us to get a lot of information out of relatively little data.
Standard error allows us to consider how sample means are distributed and whether or not the samples they came from actually represents the "population." In this regard the 68–95–99.7 rule is useful; it helps us to estimate our "degree of confidence.
In class today we used standard error to to estimate how close our sample means were to the actual mean; it enabled us to find a range of possible means with reasonable certainty.
[The] Central Limit Theorem says that sample means will be decently "normal" shaping around the actual population mean if samples are large enough and random.
Confidence intervals are calculated x bar plus or minus z star times (standard deviation divided by square root n. This can be tweaked up on down for greater or lower percentage of confidence.
Confidence intervals can be derived through different confidence levels (90%, 95% and 99%) and gets more accurate through increasing the sample size.
However, sample sizes have diminishing returns because in calculating the confidence interval we use the square root of the sample size.
Using this average we were able to obtain a "guess" at what our mean may look like— the confidence interval— which, using a formula, determined how representative our sample data is in terms of the range of the mean and what that may look like across the whole population.
Using confidence intervals, we can estimate (with varying degrees of confidence) a range which encompasses a data set's true mean.
I learned about how the confidence interval is important in helping figure out how close your sample mean is to the actual mean.
3. What can we do at this point? In other words, what do we know how to do?
4. What's in a survey? A quick power point full of questions
we want to get the sleep time and awake time for 30 people on campus
possible pitfalls
plan of attack?
http://blog.surveymonkey.com/blog/2012/04/12/good-surveys-and-bad-surveys/
http://www.uwex.edu/ces/tobaccoeval/resources/surveyquestions.html
http://www.washington.edu/lst/workshops/web_tools/resources/oea.pdf/
HW:
1. Survey 16 people on campus the time they went to bed, woke up, and the time they were asleep. Create a graphic that shows this information. Collect information on how you collected your information.
2. There are four problems below on Confidence Interval. Complete those.