2. Shape and Graphs

Task: Infographic survey results
      Use the skills from this section to write a confidence interval sentence, in +/- form, for each question you asked:
      • [2 pts] Use the proportion from your data to calculate the 95% confidence interval for every question
      • [2 pts] Convert each interval into +/- form and round to the nearest tenth of a percent
      • [2 pts] Copy, then modify each of the sentences from the survey task into confidence interval sentences using the 95% confidence level and the margin of error above
      Mastery Quiz Prep

      Although I will not ask you to make a histogram, they will probably be easier to read if you learn how to make them yourself.  Also, AP students would benefit from this exposure.  Try this excellent interactive histogram tutorial: 

      When presenting data in context, sometimes it is necessary to change units.  Also, it does not make sense to over or under-report the precision of your information.  Check out this great video on sig figs and units for more on this.  Especially useful for chemistry and helpful for AP Stats.

      1. Interpreting stem plots: The stem plot below shows the number of minutes spent by different students to complete a math homework assignment.  5|5 = 55 minutes.
      • aWhat was the lowest and highest amount of time?
      • b) Is the distribution approximately symmetrical or skewed?   If skewed, how strong is the skew and in which direction is it skewed?
      • c) How many major clusters of data are there?  Would you consider this plot unimodal, bimodal, or something else?
      • d) Are there any major outliers, gaps, or unusual features in the plot?
      • e) In what row does the data reach its peak?  What is the lowest and highest amount of time that students spend on homework in this group?
      • f) How many students were in the class?  Do you think a stem plot was a good way to visualize this distribution?  Why?
      • g) The person who made this stem plot decided to use the tens place as the stem and the ones place as the leaf, but did not decide to split the stem.  Was this a good choice?  Why?

      2. Peaks and skewness: Classify each of the curves below by its symmetry/skewness and its number of peaks:
      Skewness: skewed left, symmetrical, skewed right, constant (all values are the same)
      Peaks: unimodal, bimodal, trimodal, no peaks (constant)

      3. You are given the histogram below on the salary of a group of programmers.  The bins are defined from $40,001 to $50,000, $50,001 to $60,000, and so forth.
      • a) How many programmers in this group earn between $60k and $70k?
      • b) How many of these programmers earn more than $100k?
      • c) What percent of these programmers earn $70k or less?
      • d) What percent of these programmers earn between $60k and $90k?

      4. Practicing creating and shifting box plots
      3  6  9  10  10  11  11  11  12  13  16
              I recommend saving time while completing this problem by using the StatKey tool, but still copying the resulting box plots to paper so you can take in the full lesson of the problem.
      • a. Find the five number summary of the distribution.  Label each of the values (minimum, Q1, …).
      • b. Calculate the interquartile range (IQR).  Find (Q1 - 1.5*IQR) and (Q3 + 1.5*IQR).  Are there any outliers in this distribution?  If so, list them.
      • c. Draw a number line from 0 to 35 (with even spacing).  Leave enough room to draw 3 box plots above it by the end of this problem.
      • d. Create a box plot of this data (marking outliers with an “x”) above your number line.
      • e. Using the distribution above as a starting point, create a new distribution by adding +5 to each number.
      • f. Use your new distribution and find the five number summary.  Label each of the values.
      • g. Draw a second box plot above the same number line using this information.  Remember to use the 1.5*IQR rule to check for outliers.
      • h. What effect did adding a constant (+5) to each value have on the box plot?  How did it affect the center and spread?
      • i. Using the distribution above as a starting point, create a new distribution by doubling each number.
      • j. Use your new distribution and find the five number summary.  Label each of the values.
      • k. Draw a third box plot above the same number line using this information.  Remember to use the 1.5*IQR rule to check for outliers.
      • l. What effect did multiplying by a constant value (x2) have on the box plot?  How did it affect the center and spread?

      5. Below is a dot plot of how many teddy bears some people own.  (link to copy-paste list)
      • a. Find the five number summary of the distribution.  Label each of the values.
      • b. Looking at the dot plot above, do you think there are any outliers?  Now use the 1.5*IQR rule to determine if there are any outliers.
      • c. Create a box plot of this data, marking outliers with an “x”.
      • d. What did you think the skew of this data was before creating the box plot?  What about after making the box plot?
      • e. Imagine that every child was given a new teddy bear for participating in the study.  How would this change the median (center) you reported above?  How would this change the IQR (spread)?  You do not need to create a new box plot or calculate the entire new distribution to answer this question (use the question above for insight), but you can if it helps.
      • f. Now imagine instead that the researchers would triple every child’s collection.  How would this change the median (center) you reported above?  How would this change the IQR (spread)?

      6. Conceptual histograms.
      • a. Label each histogram as skewed left, skewed right, or symmetrical. 
      • b. Sketch in the mean and median on each of the graphs above.
      • c. On graphs that are symmetrical, where is the mean in relation to the median?
      • d. On graphs that are skewed left, where is the mean in relation to the median?
      • e. On graphs that are skewed right, where is the mean in relation to the median?
      • f. Rank the 5 graphs from the smallest standard deviation to the largest standard deviation.  (Hint: standard deviation is the average distance of values from the mean)
      • g. Rank the 5 graphs from the smallest variance to the largest variance.  Compare your results to the previous question.

      7. Patients in the Emergency Department.  (link to copy-paste list)
      73, 86, 133, 147, 117, 92, 88, 122, 168, 151, 104, 99, 106, 117, 151, 163, 109, 111, 156, 132, 96, 128, 114, 106, 85, 156, 131, 147, 171, 160, 98, 108, 106, 116, 130, 128, 85, 100, 109, 111, 117, 125, 118, 152, 148, 114, 152, 155, 117, 103, 110, 92, 110, 112, 105, 151, 108, 96, 109, 101, 153
      • a. Find the five number summary of the distribution (calculator or StatKey).  Label each of the values.
      • b. Check for outliers using the 1.5*IQR rule.
      • c. Create a box plot of this data.
      • d. Looking at the histogram, it is clear that this distribution is bimodal.  Can you determine how many peaks there are in a data set by looking at a box plot?
      • e. Looking at the histogram, there appears to be more data on the left side, making this distribution skewed right.  Does the box plot confirm this conclusion?  If so, how can you tell?

      8. Shifting distributions: here is a list of the ages of a group of cousins.
      2  2  9  9  9  10  10  10  14  14  20
      • a. The five number summary is min=2, Q1=9, med=10, Q3=14, max=20.  Create a box plot by hand.
      • b. Based on your box plot, how would you estimate the skew?  Based on this, do you think the mean is greater or less than the median?
      • c. Now create a dot plot of this data.  Is the data connected or very choppy?  Is the skew clear?
      • d. Use your calculator to find the mean.  Is it where you expected it to be?  What does this say about the ability of box plots to always predict the skew (remember skew is defined by the relationship between mean and median)?

      Free Response Prep
          • How does an edge, limit, or boundary in a situation affect the shape of its histogram/dot plot?
          • Dot plots and stem plots are a bit different than box plots and histograms.  Explain situations where each pair is more useful than the other pair.
          • There are an infinite number of different histograms you could make for a given quantitative distribution.  What is the goal of the histogram?  What are some rules of thumb to guide your creation of a useful histogram?
          Practice solutions
              1. Interpreting stem plots
              • a) lowest: 15 minutes, highest: 55 minutes
              • b) skewed right (the tail goes to the higher numbers)
              • c) unimodal -- one major peak
              • d) nothing unusual, no gaps, no outliers
              • e) in the 20's: 22-28 minutes
              • f) 18 students; yes, a stem plot was great for this number of data points spread out in this way because it formed a neat, easily-readable graph that made the shape, center, and spread fairly obvious
              • g) yes, numbers ranging from 15 to 55 give us 5 different stem, which in a small data set like this, is enough to get a good picture of the shape of the distribution
              2. Peaks and skewness
              mostly symetrical
              right skew
              right skew
               sort of symmetrical, sort of left skewed

              3. Programmer salaries
              • a) 4 programmers
              • b) 3 programmers
              • c) 7/23 = .304 = 30.4%
              • d) 14/23 = .609 = 60.9%
              4. Practicing box plots
              • a) min: 3; Q1: 9 (or 9.5); median: 11; Q3: 12 (or 11.5), max: 16
              • b) IQR = 12 - 9 = 3; (Q1 - 1.5*IQR) = 9 - 1.5*3 = 4.5; (Q3 + 1.5*IQR) = 12 + 1.5*3 = 16.5; yes, there is one value less than 4.5 (it is 3) so it is an outlier
              • c) diagram below (blue line)
              • d) diagram below (green box plot)
              • e) 8, 11, 14, 15, 15, 16, 16, 16, 17, 18, 21
              • f) min: 8; Q1: 14; median: 16; Q3: 17, max: 21
              • g) diagram below (red box plot)
              • h) It shifted the entire plot to the right 5 units.  It increased the median (center) by +5, but the spread did not change.
              • i) 6, 12, 18, 20, 20, 22, 22, 22, 24, 26, 32
              • j) min: 6; Q1: 18; median: 22; Q3: 24, max: 32
              • k) diagram below (purple box plot)
              • l) It stretched the plot to be twice as long.  It doubled the old median (center) and doubled the old IQR (spread).
              5. Teddy bears
              • a) min: 0; Q1: 2; median: 5; Q3: 13, max: 18
              • b) It appears that 18 is a ways out, but using the 1.5*IQR rule, a number could be as large as 13 + 1.5*11 = 29.5 (Q3 + 1.5*IQR) before it would actually be considered an outlier.
              • c) 
              • d) From the dot plot, it looks like more data is on the left, so it appears skewed right.  The box plot confirms this since the median bar and Q1 are very close together compared to the median and Q3.
              • e) Since 1 was added to each value, the median goes up by 1 and the IQR does not change (since Q1 and Q3 go up by the same amount, the difference between them, the IQR, does not change).\
              • f) Since everything is being multiplied by a constant, the median and IQR should both triple.
              6. Conceptual histograms
              • a) i: right skewed; ii: symmetrical; iii: left skewed; iv: symmetrical (constant); v: kind of symmetrical / kind of right skewed
              • b) (click on image to zoom in)
              • c) They are equal on symmetrical graphs.
              • d) When skewed left, the mean will be less (further left, towards the tail) than the median.
              • e) When skewed right, the mean will be greater than (further right, towards the tail) than the median.
              • f) Smallest to largest (with the actual value of the standard deviation included): ii (2.23), iii (2.49), i (2.82), iv (3.21), v (3.89).
              • g) Smallest to largest (with the actual value of the standard deviation included): ii (4.97), iii (6.20), i (7.95), iv (10.30), v (15.13).  Since the standard deviation is always positive and variance is the square of the standard deviation, it will ALWAYS be the same order.
              7. Patients
              • a) min: 73, Q1: 104.5, median: 114, Q3: 147, max: 171
              • b) 1.5*IQR = 1.5*42.5 = 63.75.  This means that all values between Q1 - 63.75 and Q3 + 63.75 are not outliers.  Thus, there are no outliers.
              • c) 
              • d) No -- box plots are horrible and identifying peaks.
              • e) Yes -- the median is shifted off to the left side of the box.

              8. Shifting distributions
              • a) Five number summary: min=2, Q1=9, med=10, Q3=14, max=20
              • b) It appears to be skewed right.  Therefore, the mean should be greater than (further to the right than) the median.
              • c)  The data comes in tight bunches and seems very chopped up.  The skew is not obvious, though it looks almost symmetrical. 
              • d) The mean is 9.9.  This is less than the median of 10, which is not expected based on the box plot.  Clearly a box plot doesn't always give a perfect prediction of skew.  Note, however, that this type of thing is rare and a box plot usually predicts the skew, especially when the data in not so chopped up like this.

                  Bimodal- having two peaks 
                  Box Plot- plotting data using quartiles and outliers
                  Histogram- plot that summarizes how data is distributed; this kind of graph puts data into groups and graphs them like a bar graph (ex: 0-9, 10-19, 20-29, etc) 
                  interquartile range- upper quartile mins the lower quartile 
                  mean- the average of a set of numbers. average is when you add all the numbers together and divide by how many numbers there are
                  median- middle number of a set of numbers
                  outlier- a data point way beyond the borders of a data set
                  percentile- the percent at or below that score 
                  quartile - splitting the data into top 25%, middle/upper 25%, middle/lower 25%, and lower 25%
                  stem plot- type of graph that separates the tens place from the ones place by a "stem" in order to organize the data. Ex: 2 I 5 = 25
                  standard deviation- a measure of how spread apart things are how far 
                  symmetrical- exactly the same on both sides 
                  trimodal- having three peaks
                  unimodal- having one peak