We will play five simple games as a class, recording each person's score in a spreadsheet. Upon completion, we will compare distributions by center and shape and discuss their how their graph characteristics connect to the type of game.
1. Calculation practice: a contest was held to guess the number of peanuts in a glass jar. The guesses were (in order): 100, 100, 100, 135, 146, 177, 187, 188, 199, 200, 200, 250, 400.
- a. Find the two measures of center.
- b. Which measure of center is more reliable in this situation as a representation of the typical person's guess? Explain.
- c. Find the three common measures of spread.
- d. What does the standard deviation represent?
2. Donations: a fundraiser collected donations from individuals throughout a town. People gave according to their ability. Here is the distribution of donations:
1 1 2 5 5 5 5 10 10 10 10 20 20 20 20 20 20 20 20 100 100 250 1000
- a. From a quick glance, how much would you consider to be a typical donation? How far apart are typical donations?
- b. Use your calculator/StatKey to find the five number summary, the mean, and the standard deviation.
- c. Is your first guess at the center (typical donation) closer to the median or the mean? What does this say about the mean and median when there are large outliers present?
- d. Is your first guess at the spread (how far apart) closer to the IQR or standard deviation? Note that the standard deviation depends on the mean, and the mean is heavily influenced by outliers. Is the IQR influenced by outliers?
- e. Now imagine that all of the donations were doubled by an anonymous corporation as part of their gift matching program. How does this change the mean, median, standard deviation, and IQR?
3. Shifting distributions: here is a list of the ages of a group of cousins.
2 2 9 9 9 10 10 10 14 14 20
- d. Use your calculator to find the mean. Is it where you expected it to be?
- e. What is the standard deviation of this data? What is the IQR?
- f. This data was taken collected 7 years ago. Assume no new cousins were born. What is the mean and median now? How did they change from 7 years ago?
- g. What is the standard deviation and IQR after 7 years have passed? How did they change from 7 years ago?
- Why do we care about the center of a distribution?
- The mean and median are mathematically different. What are the strengths and weaknesses of each as a measure of the center?
- How does knowing the spread of a distribution help you to better use data? If you didn't know the spread, how would that limit what you can do with the data?
1. Calculation practice
- a) mean=183.2, median=187
- b) Given the skew and presence of an outlier, the mean is very suspicious. In this case, it turned out very reasonable, but that doesn't always happen. The median is a better measure of "typical" in most situations.
- c) st dev=80.0, IQR=(200-135)=65, range=(400-100)=300
- d) essentially it finds the average distance each number is from the average (mean)
- a) $20 is the most common donation, and it seems about in the middle. Most of the donations are very close together (within $20), but there are a few that are much further away.
- b) Min=1, Q1=5, median=20, Q3=20, max=1000, mean=72.783, standard deviation=209.368.
- c) The median. Where there are outliers, the median is not affected. The mean can change drastically.
- d) IQR = 15, standard deviation = 209. The IQR was much better approximation of how close most of the data is. Like the median, the IQR is resistant to outliers.
- e) All 4 values would also double. Whenever all values in a distribution are multiplied by a constant, you multiply each of these values by the same constant.
3. Shifting distributions
- d) The mean is 9.9. This is less than the median of 10, which is not expected based on the box plot. Clearly a box plot doesn't always give a perfect prediction of skew. Note, however, that this type of thing is rare and a box plot usually predicts the skew, especially when the data in not so chopped up like this.
- e) IQR=5, Sx=5.127
- f) Mean = 16.9, median=17: they both went up by 7 because shifting a distribution with addition moves the center at the same rate every data point is moved.
- g) IQR=5, Sx=5.127: they are the same as before because shifting a distribution with addition does not change the spread.
Bimodal- having two peaks
Box Plot- plotting data using quartiles and outliers
Histogram- plot that summarizes how data is distributed; this kind of graph puts data into groups and graphs them like a bar graph (ex: 0-9, 10-19, 20-29, etc)
interquartile range- upper quartile mins the lower quartile
mean- the average of a set of numbers. average is when you add all the numbers together and divide by how many numbers there are
median- middle number of a set of numbers
outlier- a data point way beyond the borders of a data set
percentile- the percent at or below that score
quartile - splitting the data into top 25%, middle/upper 25%, middle/lower 25%, and lower 25%
stem plot- type of graph that separates the tens place from the ones place by a "stem" in order to organize the data. Ex: 2 I 5 = 25
standard deviation- a measure of how spread apart things are how far
symmetrical- exactly the same on both sides
trimodal- having three peaks
unimodal- having one peak