2. Quantitative Distributions

Learning objectives (and summaries)

Summarize a list of numbers, accounting for the center, spread, shape, and context of the distribution both graphically and numerically.

    • Recognize a quantitative distribution from other types of distributions

    • Single list of (unpaired) numbers.

    • Create a box plot from a quantitative distribution to visualize center and spread using the five number summary

    • Find 5-number summary with technology, graph the 5 points on a number line, draw the box and whiskers. Check for 1.5xIQR on ends.

  • Create a stem plot by hand as a quick check of shape of a distribution

    • Round values so the only significant variation occurs in the last digit, break off the first digits as the stem on the left of the line, write each leaf individually (in order) on the right.

  • Read and interpret a histogram

    • Work backwards to create a frequency chart. Interpret shape. Understand that different-looking histograms can be created for the same set of data.

    • Describe the center of a quantitative distribution with the mean and median (and know which measure is resistant to outliers)

    • Calculate mean and median with technology; know that median is not influenced by outliers; know that mean/median highlight the average/center. Interpret center in context using units.

    • Describe the spread of a quantitative distribution with the standard deviation, IQR, and range (and know which measure is resistant to outliers)

    • Calculate standard deviation with technology; calculate the IQR and range by hand from the 5-number summary; know that the IQR is not influenced by outliers; know these values are measures of how spread out a distribution is. Interpret spread in context using units.

    • Describe the shape of a quantitative distribution with peaks (unimodal, etc) and skew (left/right or symmetrical)

    • Unimodal = 1 peak, bimodal = 2, trimodal = 3. Left skew has a tail on the left and most of the data clumped up on the right. Vice versa for right skew. A bump in the middle is symmetrical (don't call this "normal").

Assessment (13 core points)

    • Test (12pts): 10 questions (6 MC, 1 numeric, 2 graphing, 1 written), AND 1 of these free response questions (2pts):

      • Why do we care about the center of a distribution?

      • The mean and median are mathematically different. What are the strengths and weaknesses of each as a measure of the center?

      • How does knowing the spread of a distribution help you to better use data? If you didn't know the spread, how would that limit what you can do with the data?

Instruction

Printable guided notes: version 1, version 2

Use this StatKey link to practice after watching the next video:

If you do not plan to take the AP test, you do not need to learn the TI-83/84 commands because you can use the online calculator. Thus, you don't need to watch this video.

Although I will not ask you to make a histogram, they will probably be easier to read if you learn how to make them yourself. Also, AP students would benefit from this exposure. Try this excellent interactive histogram tutorial:

When presenting data in context, sometimes it is necessary to change units. Also, it does not make sense to over or under-report the precision of your information. Check out this great video on sig figs and units for more on this. Especially useful for chemistry and helpful for AP Stats.

Vocabulary

Bimodal- having two peaks

Box Plot- plotting data using quartiles and outliers

Histogram- plot that summarizes how data is distributed; this kind of graph puts data into groups and graphs them like a bar graph (ex: 0-9, 10-19, 20-29, etc)

interquartile range- upper quartile mins the lower quartile

mean- the average of a set of numbers. average is when you add all the numbers together and divide by how many numbers there are

median- middle number of a set of numbers

outlier- a data point way beyond the borders of a data set

percentile- the percent at or below that score

quartile - splitting the data into top 25%, middle/upper 25%, middle/lower 25%, and lower 25%

stem plot- type of graph that separates the tens place from the ones place by a "stem" in order to organize the data. Ex: 2 I 5 = 25

standard deviation- a measure of how spread apart things are how far

symmetrical- exactly the same on both sides

trimodal- having three peaks

unimodal- having one peak

Practice

1. Interpreting stem plots: The stem plot below shows the number of minutes spent by different students to complete a math homework assignment. 5|5 = 55 minutes.

  • a) What was the lowest and highest amount of time?

    • b) Is the distribution approximately symmetrical or skewed? If skewed, how strong is the skew and in which direction is it skewed?

    • c) How many major clusters of data are there? Would you consider this plot unimodal, bimodal, or something else?

    • d) Are there any major outliers, gaps, or unusual features in the plot?

    • e) In what row does the data reach its peak? What is the lowest and highest amount of time that students spend on homework in this group?

    • f) How many students were in the class? Do you think a stem plot was a good way to visualize this distribution? Why?

    • g) The person who made this stem plot decided to use the tens place as the stem and the ones place as the leaf, but did not decide to split the stem. Was this a good choice? Why?

2. Peaks and skewness: Classify each of the curves below by its symmetry/skewness and its number of peaks:

Skewness: skewed left, symmetrical, skewed right, constant (all values are the same)

Peaks: unimodal, bimodal, trimodal, no peaks (constant)

3. You are given the histogram below on the salary of a group of programmers. The bins are defined from $40,001 to $50,000, $50,001 to $60,000, and so forth.

  • a) How many programmers in this group earn between $60k and $70k?

  • b) How many of these programmers earn more than $100k?

  • c) What percent of these programmers earn $70k or less?

  • d) What percent of these programmers earn between $60k and $90k?

4. Practicing creating and shifting box plots

3 6 9 10 10 11 11 11 12 13 16

I recommend saving time while completing this problem by using the StatKey tool, but still copying the resulting box plots to paper so you can take in the full lesson of the problem.

  • a. Find the five number summary of the distribution. Label each of the values (minimum, Q1, …).

  • b. Calculate the interquartile range (IQR). Find (Q1 - 1.5*IQR) and (Q3 + 1.5*IQR). Are there any outliers in this distribution? If so, list them.

  • c. Draw a number line from 0 to 35 (with even spacing). Leave enough room to draw 3 box plots above it by the end of this problem.

  • d. Create a box plot of this data (marking outliers with an “x”) above your number line.

  • e. Using the distribution above as a starting point, create a new distribution by adding +5 to each number.

  • f. Use your new distribution and find the five number summary. Label each of the values.

  • g. Draw a second box plot above the same number line using this information. Remember to use the 1.5*IQR rule to check for outliers.

  • h. What effect did adding a constant (+5) to each value have on the box plot? How did it affect the center and spread?

  • i. Using the distribution above as a starting point, create a new distribution by doubling each number.

  • j. Use your new distribution and find the five number summary. Label each of the values.

  • k. Draw a third box plot above the same number line using this information. Remember to use the 1.5*IQR rule to check for outliers.

  • l. What effect did multiplying by a constant value (x2) have on the box plot? How did it affect the center and spread?

5. Below is a dot plot of how many teddy bears some people own. (link to copy-paste list)

  • a. Find the five number summary of the distribution. Label each of the values.

  • b. Looking at the dot plot above, do you think there are any outliers? Now use the 1.5*IQR rule to determine if there are any outliers.

  • c. Create a box plot of this data, marking outliers with an “x”.

  • d. What did you think the skew of this data was before creating the box plot? What about after making the box plot?

  • e. Imagine that every child was given a new teddy bear for participating in the study. How would this change the median (center) you reported above? How would this change the IQR (spread)? You do not need to create a new box plot or calculate the entire new distribution to answer this question (use the question above for insight), but you can if it helps.

  • f. Now imagine instead that the researchers would triple every child’s collection. How would this change the median (center) you reported above? How would this change the IQR (spread)?

6. Conceptual histograms.

  • a. Label each histogram as skewed left, skewed right, or symmetrical.

  • b. Sketch in the mean and median on each of the graphs above.

  • c. On graphs that are symmetrical, where is the mean in relation to the median?

  • d. On graphs that are skewed left, where is the mean in relation to the median?

  • e. On graphs that are skewed right, where is the mean in relation to the median?

  • f. Rank the 5 graphs from the smallest standard deviation to the largest standard deviation. (Hint: standard deviation is the average distance of values from the mean)

  • g. Rank the 5 graphs from the smallest variance to the largest variance. Compare your results to the previous question.

7. Donations: a fundraiser collected donations from individuals throughout a town. People gave according to their ability. Here is the distribution of donations:

1 1 2 5 5 5 5 10 10 10 10 20 20 20 20 20 20 20 20 100 100 250 1000

  • a. From a quick glance, how much would you consider to be a typical donation? How far apart are typical donations?

  • b. Use your calculator/StatKey to find the five number summary, the mean, and the standard deviation.

  • c. Is your first guess at the center (typical donation) closer to the median or the mean? What does this say about the mean and median when there are large outliers present?

  • d. Is your first guess at the spread (how far apart) closer to the IQR or standard deviation? Note that the standard deviation depends on the mean, and the mean is heavily influenced by outliers. Is the IQR influenced by outliers?

  • e. Now imagine that all of the donations were doubled by an anonymous corporation as part of their gift matching program. How does this change the mean, median, standard deviation, and IQR?

8. Patients in the Emergency Department. (link to copy-paste list)

73, 86, 133, 147, 117, 92, 88, 122, 168, 151, 104, 99, 106, 117, 151, 163, 109, 111, 156, 132, 96, 128, 114, 106, 85, 156, 131, 147, 171, 160, 98, 108, 106, 116, 130, 128, 85, 100, 109, 111, 117, 125, 118, 152, 148, 114, 152, 155, 117, 103, 110, 92, 110, 112, 105, 151, 108, 96, 109, 101, 153

  • a. Find the five number summary of the distribution (calculator or StatKey). Label each of the values.

  • b. Check for outliers using the 1.5*IQR rule.

  • c. Create a box plot of this data.

  • d. Looking at the histogram, it is clear that this distribution is bimodal. Can you determine how many peaks there are in a data set by looking at a box plot?

  • e. Looking at the histogram, there appears to be more data on the left side, making this distribution skewed right. Does the box plot confirm this conclusion? If so, how can you tell?

9. Shifting distributions: here is a list of the ages of a group of cousins.

2 2 9 9 9 10 10 10 14 14 20

  • a. The five number summary is min=2, Q1=9, med=10, Q3=14, max=20. Create a box plot by hand.

  • b. Based on your box plot, how would you estimate the skew? Based on this, do you think the mean is greater or less than the median?

  • c. Now create a dot plot of this data. Is the data connected or very choppy? Is the skew clear?

  • d. Use your calculator to find the mean. Is it where you expected it to be? What does this say about the ability of box plots to always predict the skew (remember skew is defined by the relationship between mean and median)?

  • e. What is the standard deviation of this data? What is the IQR?

  • f. This data was taken collected 7 years ago. Assume no new cousins were born. What is the mean and median now? How did they change from 7 years ago?

    • g. What is the standard deviation and IQR after 7 years have passed? How did they change from 7 years ago?

Practice solutions

1. Interpreting stem plots

    • a) lowest: 15 minutes, highest: 55 minutes

    • b) skewed right (the tail goes to the higher numbers)

    • c) unimodal -- one major peak

    • d) nothing unusual, no gaps, no outliers

    • e) in the 20's: 22-28 minutes

    • f) 18 students; yes, a stem plot was great for this number of data points spread out in this way because it formed a neat, easily-readable graph that made the shape, center, and spread fairly obvious

    • g) yes, numbers ranging from 15 to 55 give us 5 different stem, which in a small data set like this, is enough to get a good picture of the shape of the distribution

2. Peaks and skewness

unimodal

mostly symetrical

unimodal

right skew

unimodal

right skew

bimodal

sort of symmetrical, sort of left skewed

3. Programmer salaries

    • a) 4 programmers

    • b) 3 programmers

    • c) 7/23 = .304 = 30.4%

    • d) 14/23 = .609 = 60.9%

4. Practicing box plots

    • a) min: 3; Q1: 9 (or 9.5); median: 11; Q3: 12 (or 11.5), max: 16

    • b) IQR = 12 - 9 = 3; (Q1 - 1.5*IQR) = 9 - 1.5*3 = 4.5; (Q3 + 1.5*IQR) = 12 + 1.5*3 = 16.5; yes, there is one value less than 4.5 (it is 3) so it is an outlier

    • c) diagram below (blue line)

    • d) diagram below (green box plot)

    • e) 8, 11, 14, 15, 15, 16, 16, 16, 17, 18, 21

    • f) min: 8; Q1: 14; median: 16; Q3: 17, max: 21

    • g) diagram below (red box plot)

    • h) It shifted the entire plot to the right 5 units. It increased the median (center) by +5, but the spread did not change.

    • i) 6, 12, 18, 20, 20, 22, 22, 22, 24, 26, 32

    • j) min: 6; Q1: 18; median: 22; Q3: 24, max: 32

    • k) diagram below (purple box plot)

    • l) It stretched the plot to be twice as long. It doubled the old median (center) and doubled the old IQR (spread).

5. Teddy bears

    • a) min: 0; Q1: 2; median: 5; Q3: 13, max: 18

    • b) It appears that 18 is a ways out, but using the 1.5*IQR rule, a number could be as large as 13 + 1.5*11 = 29.5 (Q3 + 1.5*IQR) before it would actually be considered an outlier.

    • c)

    • d) From the dot plot, it looks like more data is on the left, so it appears skewed right. The box plot confirms this since the median bar and Q1 are very close together compared to the median and Q3.

    • e) Since 1 was added to each value, the median goes up by 1 and the IQR does not change (since Q1 and Q3 go up by the same amount, the difference between them, the IQR, does not change).\

    • f) Since everything is being multiplied by a constant, the median and IQR should both triple.

6. Conceptual histograms

    • a) i: right skewed; ii: symmetrical; iii: left skewed; iv: symmetrical (constant); v: kind of symmetrical / kind of right skewed

    • b) (click on image to zoom in)

    • c) They are equal on symmetrical graphs.

    • d) When skewed left, the mean will be less (further left, towards the tail) than the median.

    • e) When skewed right, the mean will be greater than (further right, towards the tail) than the median.

    • f) Smallest to largest (with the actual value of the standard deviation included): ii (2.23), iii (2.49), i (2.82), iv (3.21), v (3.89).

    • g) Smallest to largest (with the actual value of the standard deviation included): ii (4.97), iii (6.20), i (7.95), iv (10.30), v (15.13). Since the standard deviation is always positive and variance is the square of the standard deviation, it will ALWAYS be the same order.

7. Donations

    • a) $20 is the most common donation, and it seems about in the middle. Most of the donations are very close together (within $20), but there are a few that are much further away.

    • b) Min=1, Q1=5, median=20, Q3=20, max=1000, mean=72.783, standard deviation=209.368.

    • c) The median. Where there are outliers, the median is not affected. The mean can change drastically.

    • d) IQR = 15, standard deviation = 209. The IQR was much better approximation of how close most of the data is. Like the median, the IQR is resistant to outliers.

    • e) All 4 values would also double. Whenever all values in a distribution are multiplied by a constant, you multiply each of these values by the same constant.

8. Patients

    • a) min: 73, Q1: 104.5, median: 114, Q3: 147, max: 171

    • b) 1.5*IQR = 1.5*42.5 = 63.75. This means that all values between Q1 - 63.75 and Q3 + 63.75 are not outliers. Thus, there are no outliers.

    • c)

    • d) No -- box plots are horrible and identifying peaks.

    • e) Yes -- the median is shifted off to the left side of the box.

9. Shifting distributions

    • a) Five number summary: min=2, Q1=9, med=10, Q3=14, max=20

    • b) It appears to be skewed right. Therefore, the mean should be greater than (further to the right than) the median.

    • c) The data comes in tight bunches and seems very chopped up. The skew is not obvious, though it looks almost symmetrical.

    • d) The mean is 9.9. This is less than the median of 10, which is not expected based on the box plot. Clearly a box plot doesn't always give a perfect prediction of skew. Note, however, that this type of thing is rare and a box plot usually predicts the skew, especially when the data in not so chopped up like this.

    • e) IQR=5, Sx=5.127

    • f) Mean = 16.9, median=17: they both went up by 7 because shifting a distribution with addition moves the center at the same rate every data point is moved.

    • g) IQR=5, Sx=5.127: they are the same as before because shifting a distribution with addition does not change the spread.

Notes