Chapter 3

Section 3-1 Measures of the center

Mean- average of the numbers 

Median- middle number in the sorted data set

Mode- most frequent number in the data set

Midrange=(minimum + maximum)/2

Mean, Median, and Mode on the Normal and Skewed Distributions (video link explaining where these fall on the distribution) 


Section 3-2 Measures of Variation

Range=max- min

Standard Deviation- s - A measure describing the dispersal of the data away from the mean. (see below example for details)

Variance- The value of the variance of a set of data is equal to the square of the standard deviation.

Example of finding the range, standard deviation, and variance of a set of data:

Data set: 6, 7, 10, 13, 14

See statcrunch directions at the bottom of the page or here on how to do all of the above much easier.  I highly recommend you use statcrunch for this course as it will save a tremendous amount of time.

What is standard deviation and how does it relate to the normal distribution? (this video is a quick explanation of concept of standard deviation and what it is used for. This is a good video, watch it.)  

Most values fall within 2 standard deviations of the mean.  Below is an important concept that will be utilized in the course from here on out.  Basically, it is usual for a value to fall inside 2 standard deviations away from the mean and unusual to fall outside 2 standard deviations.

Empirical Rule- When you have a normally distributed data set with the mean and standard deviation known, the following percentages of data falls within 1, 2, and 3 standard deviations of the mean.  Here's a short video explaining the concept: 

Empirical rule explanation video (little dry but explains the concept just the same)

Example using the Empirical Rule:

Male heights are approximately normally distributed with a mean of 70 inches and a standard deviation of 3 inches. 

A) What is the approximate percentage of males between 67 and 73 inches? 

Since 67 and 73 correspond to 1 standard deviation below and above of the mean, this would be approximately 68%

B) What is the approximate percentage of males between 64 and 76 inches? 

Since 64 and 76 correspond to 2 standard deviation below and above of the mean, this would be approximately 95%


Chebyshev's Theorem-a theorem which is valid for any distribution of data:
For any number k greater than 1, at least 1-1/k2 of the data will fall within k standard deviations of the mean.
Useful rules:
-no information can be obtained on the fraction of values falling within 1 standard deviation of the mean
-at least 75% will fall within 2 standard deviations
-at least 88.8 % will fall within 3 standard deviations

Ex: Suppose you know a data set has a mean of 50 and a standard deviation of 10 (does not have to be normally distributed).  Using Chebyshev's Theorem, what percentage of the data will fall between 30 and 70 (within 2 standard deviations)?

1-1/2^2 = 1- 1/4 = 3/4.
At least 3/4=75% of the data will fall within 2 standard deviations.


Section 3.3 Measures of Relative Standing and Boxplots

Z score- A measure of how many standard deviations above or below the mean a data point is.

 z-score=(x-mean)/s

Example of finding z-score:

Male heights are approximately normally distributed with a mean of 70 inches and a standard deviation of 3 inches. 

Shaq has a height of 85 inches.  What would his z score be?

Z=(85-70)/3 = 15/3 = 5

So, that means that Shaq's height is 5 standard deviations above the mean.  Not surprisingly, his height is off the chart!

Percentiles-partitions in that data representing cut offs for different percentages.  For example, P35 stands for the 35th percentile, 35% of the data falls below this point, 65% same or above.

Finding percentiles:

Example:

Fish Length (in) (Rockfish, California Halibut, and Lingcod lengths from a mercury study that a few buddies and I are conducting in Northern California) 26.50, 29.50, 24.70, 29, 29.25, 22, 14, 14, 16, 16.25, 16.25, 17, 19, 20, 13.25, 15.50, 16, 16, 16.75, 17.75, 18, 28.75, 29, 33.50, 33.50, 33.50, 34, 35, 38.50, 40

Find The 75th percentile.

First we need to sort the data from smallest to largest.

13.25, 14, 14, 15.50, 16, 16, 16, 16.25, 16.25, 16.75,17,17.75, 18, 19, 20, 22, 24.70, 26.50, 28.75, 29, 29, 29.25, 29.50, 33.50, 33.50, 33.50, 34, 35, 38.50, 40

Next we need to find L. L = (percentile in question/100)*n =(75/100)*30=22.5

We now have two options:

1) If L is a whole number, we average the Lth and (L+1)th value

2) If L is not a whole number (our case here), round up to the next position.

We have case 2, so we round up to the 23rd position.

P75 = 29.5 inches

So, this means that 75% of the fish we caught for our survey are under 29.5 inches and 25% are 29.5 inches or over.

The reverse can be done as well.  Say we wanted to know what percentile a 35 inch fish represents.  To find the percentile, we use the following:

Video on finding the percentile for value x

percentile for value x= (number of values less than x)/(total number of values) *100% 

If we want to find the percentile for a 35 inch fish:

percentile for value 35= (number of values less than 35)/(total number of values) *100% =(27/30)*100%= 90%

P90 = 35 inches


5 number summary- a way to summarize the data using the 5 numbers of: Minimum, Q1, Q2, Q3, and Maximum.

note: Q1 means the first quartile or P25 ; Q2=P50 = median

Finding The 5 number summary:

Example:

Fish Length (in) (Rockfish, California Halibut, and Lingcod lengths from a mercury study that a few buddies and I are conducting in Northern California) 13.25, 14, 14, 15.50, 16, 16, 16, 16.25, 16.25, 16.75,17,17.75, 18, 19, 20, 22, 24.70, 26.50, 28.75, 29, 29, 29.25, 29.50, 33.50, 33.50, 33.50, 34, 35, 38.50, 40

Minimum=13.25

Q1= 16.25                             L=(25/100)*30=7.5 not a whole number, round up to 8th position.

Q2 = 21                                 L=(50/100)*30=15 whole number, average the 15th and 16th position

Q3 =29.5                               L=(75/100)*30=22.5 not a whole number, round up to 23rd position.

Maximum=40

Boxplot- Graphical display of the 5 number summary.

From statcrunch the boxplot will look like this:

Just about everything in this chapter can be found using Statcrunch.


Mean, median(Q2), standard deviation, variance, minimum, Q1, Q3, maximum:Click the little box to the right of the data set in any homework, quiz , or test question and select "Open in Statcrunch". Now that Statcrunch is open, click "Stat", "Summary Stats", then "Columns".  Select the variable.  Click "Compute".

Boxplot:Click the little box to the right of the data set and select "Open in Statcrunch". Now that Statcrunch is open, click "Graph" and then "Boxplot".  Select the variable. Check the box next to "Draw boxes horizontally". Click "Compute".