Standard Deviation (s)

What is the standard deviation?

The standard deviation is an indicator variability of a set of data and it measures of how good the average of the data represents the actual data. It is represented by the lower case letter "s" in a sample or the Greek letter sigma σ in a population. It is used in science to evaluate the reliability of the data: the smaller the standard deviation, the higher the reliability. A small standard deviation shows that there is little variability between data points, or, in other words, that most of the data points are relatively close to the average. To be more precise, 68.2 % of the data are within the range of the average +/- the standard deviation. Please read here.

In this video Ms Ruebe explains how to include standard deviation error bars using Microsoft Excel

How to use Excel (data, graphing, SD & Error bars).webm

Example and interpretation

The results of an exam show a class average of 65% for 50 students. The standard deviation is 23. What does that mean? If the results follow a normal distribution, it means that 68.2% of the 50 students got a grade between (65-23) 42% and (64+23) 87%. If you don't know your grade, and you are concerned about failing the exam, what would make you happier, an average of 65 with standard deviation of 23 or with a standard deviation of 12? Why?

Reliability, variability and significant difference between averages

The standard deviation is an indicator of variability and it is often represented as error bars in graphic presentations of processed data. If the standard deviation of a set of data is high, the variability of a set of data is also high, and therefore, the average does not very well represent the data. When averages of data sets are plotted in a graph, the standard deviation can be shown as error bars, to evaluate the differences between the averages. Overlapping error bars indicate the average presented in the graph does not show a significant difference between the populations. If the error bars overlap, there is no evidence to support that the data sets are significantly different. Careful, the overlapping error bars obtained form the standard deviation do not suggest that the data sets have no differences. There may have or may no differences, but the data presented does not demonstrate that the data sets are significantly different.

Overlapping error bars (standard deviation): there is not significant difference between the averages of the populations.

In this graph, the error bars represent the standard deviation of the different data sets. Each data set corresponds to a different distance from a light source. Is there a difference between the number of bubbles produced at 10 and at 15 cm. The answer is MAYBE. The data collected had a relatively large variability in the measurements for 10 and 15 cm from the light source. Therefore, the average shown in the graph does not ver very well represent all the data collected. According to the data collected, there is no significant difference between the average number of bubbles produced in one minute at 10 cm and at 15 cm. However, the data show a significant difference between the number of bubbles produced at 5 cm and at 25 cm of distance from the light source.

Is the data reliable? Is the calculated standard deviation large?

The standard deviation is used to assess the reliability of a set of data. The larger the standard deviation the higher the variability and the lower the reliability. But, how can I know if my standard deviation is large or small?

A high standard deviation shows that the data is widely spread (less reliable) and a low standard deviation shows that the data are clustered closely around the mean (more reliable). But, how do you know if your Standard deviation is high or low? To make that judgement, you need to look at:

The minimum and the maximum values of your data set
The value of the average.
Consider the factor that you are measuring: the σ is always larger for an ecosystem than for an enzyme-controlled lab test.

Alternatively, the coefficient of variation can be calculated and used to evaluate the standard deviation.

How to plot error bars with Standard Deviation values in Google Sheets

In order to put different values for each error bar, you need to have your columns separated as different series. This may require organizing your data differently: make a new data table and try to switch columns and rows until you manage to have your data as different Series (see lower right side of the image below).

Data range and outliers

The range is the measure of the spread of data: the difference between the largest and the smallest observed values. If one data point was unusually large or unusually small, this very large or small data point would have a big effect on the range. Such very large or very small data points are called outliers. If an experiment is measuring the change in height of plats (as an independent variable is modified) and one of the plants dies early and had an unusual low height, that data point would be considered to be an outlier. In a lab report, it is acceptable to exclude an outlier from data processing, but it is important to declare it and explain why it was excluded. The best option would be to show the data processing with and without the outliers, explaining how the outliers were identified and why were them removed. Outliers may make a big difference in the graphic presentation of data, so a graph without outliers is strongly encouraged.

en.wikipedia.org/wiki/Outlier

5 Ways to Detect Outliers/Anomalies That Every Data Scientist Should Know (Python Code)Detecting Anomalies is critical to any business either by identifying faults or being proactive. This article discusses 5 different ways…

When is it justifiable to exclude 'outlier' data points from statistical analyses?Read 162 answers by scientists with 307 recommendations from their colleagues to the question asked by Marcel M. Lambrechts on Jun 28, 2014

If the origin or source of outliers can be justified, the data presentation should be done with and without outliers. The difference may be huge, as it shows the example above.

Report abuse