Standard Deviation

Standard Deviation (σ)

What is the Standard Deviation?

Well, just like with anything you are unfamiliar with, it is best to break the words down and think about how they might make sense in the context provided. So the first part of 'standard deviation' is 'standard'... Okay, so maybe this idea can be applied to a lot of data sets. Let's move on to the second word, 'deviation'. Well, if someone deviates from a certain path (like Anakin Skywalker deviating from the light side of the Force... oh sorry, spoilers I guess...), what does that mean? That means he strayed away from that path; he is no longer on that path. So, can data do the same thing? Well let's check it out:

Fitted Line Plot : Output 1= 44.53+ 2.024 Input

Fitted Line Plot: Output 2=44.86+2.134 Input

Okay I see some graphs... Let's break them down. Both are sets of data, and the variables don't really matter for the example. The scales on the axes are very similar, so we don't need to worry about that. Let's just look at the scatterplots. The graph on the left has a line of best fit (that's the red line)...If you don't remember what that is, it is basically the line that is as close as possible to ALL of the data points (the blue dots). It fits the data the best it can. The graph on the right has the same thing, a line of best fit.

What's the difference between these two images, then? Well, you can see a lot more of the line in the image on the left. That's because the data points are way more spread out. That means that the data DEVIATES from that line of best fit a lot. So, the data on the left has a higher standard deviation than the data on the right. The data on the right is more clumped together around that line... each point, on average, deviates far less in that graph from the line of best fit than the data points on the left image.

Imagine you had to measure the distance between all the points and the red line. So you have to go from the red line to each and every data point. On average, you're going to measure a lot larger of distances on the graph on the left. That is a visual for standard deviation.

Why do High and Low Standard Deviations Matter?

Well, if you have a higher standard deviation, you probably will have less confidence in your data. If the data seems to jump all over the place, I don't want to rely on the 'average' that is the line of best fit, because I might be very, very wrong. But if most of the data sticks close to that line, I'm less worried.

So How Do I Calculate Standard Deviation?

I'm glad you asked, there is an equation, and it is even provided to you on the AP exam and all of my exams. It is written as:

s is the standard deviation
∑ means 'sum of', so you have to calculate (xi - x̄)2 for each data point and then add them up
xi is every data point, so you have to do it 5 times if there are 5 data points
x̄ is the average for the data
n is the sample size (how many data points are there?)

Wait, Why Does Sample Size Matter?

Well, if there are more data points, I can be more confident in my estimates. So, that means that standard deviation will decrease. More data is always better - I'd rather have data I can be more confident in, even if it doesn't support my initial hypothesis. You can think of it mathematically as well. Look at the equation above. If n, or the sample size, is bigger, standard deviation will have a larger number on the denominator, leading to a smaller value of s. Never look at an equation and just plug and chug. Pay attention to what the variables mean. They are where they are for a reason!

Why is it n-1 for Standard Deviation?

So why do we subtract 1 when using these formulas? The simple answer: the calculations for both the sample standard deviation and the sample variance both contain a little bias (that's the statistics way of saying “error”). Bessel's correction (i.e. subtracting 1 from your sample size) corrects this bias.

Remember, since we use a sample to estimate the population, you want to be more conservative with your estimate, therefore you subtract 1 from your sample size in the denominator, which makes a slightly larger calculated standard deviation. If you were to measure every individual in the population--not merely a representative sample--then you would use the sample size (n) to determine the standard deviation.

How to Calculate the Standard Deviation:

Calculate the mean (x̅) of a set of data
Subtract the mean from each point of data to determine (x-x̅). You'll do this for each data point, so you'll have multiple (x-x̅).
Square each of the resulting numbers to determine (x-x̅)^2. As in step 2, you'll do this for each data point, so you'll have multiple (x-x̅)^2.
Add the values from the previous step together to get ∑(x-x̅)^2. Now you should be working with a single value.
Calculate (n-1) by subtracting 1 from your sample size. Your sample size is the total number of data points you collected.
Divide the answer from step 4 by the answer from step 5
Calculate the square root of your previous answer to determine the standard deviation.
Be sure your standard deviation has the same number of units as your raw data, so you may need to round your answer.
The standard deviation should have the same unit as the raw data you collected. For example, SD = +/- 0.5 cm.

Using T1- Inspire for Standard Deviation

Ti -84 Using a graphing calculator to find the mean and standard deviation

First you have to enter the data. Hit the STAT button and you will see the options EDIT, CALC and TESTS atop the screen. Use the left and right arrows (if necessary) to move the cursor to EDIT, then select 1: Edit...

Now you will see a table with the headings L1 and L2. Enter the values under L1 (if you want to clear pre-existing data first, move the cursor to the top of the column, hit CLEAR and then ENTER.)

Once all the data is entered, go back to the STAT menu, but this time move the cursor to CALC instead of EDIT.

Once you're in the CALC menu, select 1-Var Stats, then hit ENTER. The calculator will display the x-mean, some other stuff, and then the standard deviation (sx). Note that Sx is what we want.

Using Google Sheets

Use Sheets to calculate the mean of your data.
Click on the box in which you want the Standard Deviation to be placed
Click the "Formulas" tab at the top of the screen
Select the “Insert Function button”
Search to find the STDEV option, click OK
Highlight the data of which you want the SD to be calculated, click OK. Be sure not to select the mean as one of your data points for calculating standard deviation. This is a common mistake.

Once you have the mean and standard deviation, you need to make sure that you set the values to the correct number of digits. Sheets will default to giving you too many numbers after the decimal place. Your mean and standard deviation must have the same precision (number of digits after the decimal) as your data points. So, the example, the standard deviation should be a whole number. To do this, click the box which is displaying the standard deviation, and on the "Home" tab click the decrease decimal button until you have the correct number of digits showing.

Review of Basic Statistic Terms including Degrees of Freedom

Page updated

Report abuse