Online:
Mean, SD (univariate data): https://www.easycalculation.com/statistics/standard-deviation.php
Correlation (bivariate data): http://www.socscistatistics.com/tests/pearson/Default2.aspx
t.test (one sample): http://graphpad.com/quickcalcs/OneSampleT1.cfm
t.test (two sample): http://www.physics.csbsju.edu/stats/t-test_bulk_form.html
stacked barplots: https://www.easycalculation.com/graphs/stacked-bar-graph.php
multiple boxplots: http://www.imathas.com/stattools/boxplot.html
looks like it does A LOT: https://plot.ly/ (you have to sign up, but it's free, at least for a while? not 100% sure here)
another one size fits many: https://infogr.am/
Question One: One Man’s Trash is another Man’s … no this is all Trash
Below, there is a .csv entitled “garbage”. This is the amount of garbage from 62 different households for a week. It is broken up into different types of garbage (all measured in pounds).
Is there a correlation between HHSIZE (household size) and TOTAL (total trash)? Are there any outliers we need to remove? Include a graph of the line you made along with your r value and any other important information.
According to your line of best fit (regardless of whether or not it’s a good descriptor of the data), how much trash does each additional person in a household add?
Can we say that people throw out more FOOD than GLASS?
We are 95% confident that the average household throws away between _____ and _____ pounds of trash a week.
Find one other connection, correlation, stat, fact or interesting piece of information from this data set.
Who do you think sorted through all this garbage, and are you interested in that job, because I bet they’re hiring.
Question Two: Caught NAEPping.
The NAEP assesses what students know in several subject areas based on large representative samples. The table on the website reports that math findings from 2000. For each state we have the mean NAEP score (out of 500) and also the percent of students who were at least “proficient” in the sense of being able to solve real world problems. Nationally, we expect 25% of people to be proficient. Also included is the state’s poverty rate. All data is in a scan below.
Is there a correlation between mean score and poverty rate? Analyze the data, including removing any points of influence (if needed or appropriate)
Make something interesting talking about a specific part of this data. This could be a graph, a graphic, something else and does not need to encompass the whole data set. This is a very open question, so pick something of interest and make something clean, clear, and concise.
Question Three: Smells Like Victory
The .pdf labeled “odors.pdf” has a table, number 2.3. Read question 2.45 for context, but do not answer that question. Instead, answer the following.
Do we have enough evidence to prove that Lemon Odor increases the amount people spend over no odor? Set up the question appropriately.
Do we have enough evidence to prove that Lavender Odor increases the amount people spend over no odor? Set up the question appropriately.
Create a visual that cleanly, clearly, and concisely shows this data.
Give 95% confidence intervals for the amount spent for each of the three different data sets.
Question Four: “I’ll have a Coke!”
Cans of Regular Coca-Cola are labelled as containing 12 oz. Assume that the actual contents of the cans are normally distributed with a mean of 12.19 oz. and a standard deviation of .11 oz.
What percentage of cans contain less than the expected 12 oz?
A can of coke can only hold up to 12.44 oz. What percentage of cans will overflow?
Question Five: Those Poor, Poor Newts
So here we go again, cutting newts like it's our job. Taken from a slightly different Newt experiment....
"Difference of electric potential occur naturally from point to point on a body's skin. Is the natural electric field's average strength the best for healing of skin? If so, changing the field on the skin of a newt would slow healing. The newts are anesthetized and a small cut is made on the back of both hind legs. One is left to heal naturally, the other has an electrode placed on it to change the electric field to half the normal. After two hours, we measure the healing rate (in micrometers per hour)."
The data is below in the .csv file "newts".
a) run the appropriate tests on these data. Is the data significant at the 0.05 level?
b) give a 90% confidence interval for the difference in healing rates.
Question Six: IQ 4 U
Assume IQ scores are N(107, 15) and are not biased towards any specific group of people.
a) Suzie scored a 133. What is her normalized score (z-score), and what percent of the population did she do better than?
b) Jon scored a 91. What is his normalized score and what percent of the population did he do better than?
c) What percent of the population scores between a 95 and a 113 on the IQ test?
d) In order to be join MENSA, you must score in the top 2% on the IQ test. What score must you achieve?
e) Describe the difference between these two statements. Give values for each:
95% of people scored between what two values?
95% of people scored less than what value?
f) I want to see if a high school has an IQ that is above average. I want to select 30 students at random to test. Design a way for me to select these students that is as random as possible from the school.
Using your amazing techniques described in part f, I get these values:
110, 112, 62, 116, 83, 98, 124, 92, 125, 126, 110, 134, 116, 81, 103, 89, 94, 124, 105, 97, 92, 112, 122, 125, 107, 115, 114, 109, 99, 108
g) Run a hypothesis test from start to finish to determine if this school can be said to have an above average IQ. (Ho, Ha, alpha, check outliers, run analysis, analyze)
h) What is my 95% confidence level for the mean IQ score for the school?
Question Seven: Right Arm Green! Left Leg Blue!
The last data set we will look at today is labeled tornadoes. It shows the number of tornadoes in the united states yearly since they started paying attention in 1953.
a. Give a 95% confidence interval for the expected number of tornadoes in a given year.
b. plot time and tornadoes. find a correlation line. Do the number of tornadoes seem to be increasing? Give a couple of possible reasons for this.