Q4 Knowledge-Base

Research Methods -- CRJU 3601 – Knowledge-Base

Explaining Variability (The Sample Mean and Standard Deviation)

We have a data-set, it consists of a “random sample” which we use to represent the population. Our first chore is to describe it with numbers (“Sample Statistics” / Population Parameters) the following way:

Data Set: 2, 6, 6, 8, 8, 8, 8, 9, 10, 10, 12

The sample size

There are 11 scores in the data-set -- Reduce to single number: (N = 11)

The Range

The minimum score is 2, maximum score is 12 -- Reduce to single number: (range: 12 - 2 = 10)

Then we identify the minimum and maximum scores to compute the range. That is transferred into a single number by subtracting minimum from the maximum score. The Range is not always reported, but the mean and standard deviation are always reported.

Reduce to single number: The average score is 7.909 (mean: 87 / 11 = 7.909)

Since there are many scores in the data set, we need a single number to represent the entire data set. We use a single number to describe the “average” score (the mean)

The best predictor of any randomly drawn score in the data set is the Mean -- It is a measure of Centrality (central tendency)

The Mean is a Powerful tool. The reason it is such a good predictor, is because it contains information about every single subject (data point) in the sample.

We are ultimately interested in finding evidence to support a theory; for example, “Does the ‘treatment’ (gender) have an effect on achievement scores?” We first identify which scores belong to men and which belong to women, and then use descriptive statistics, mean, (min, max, range) standard deviation, to see how these two “treatment groups” differ.


When a researcher is interested in discovering characteristics about a population of people, she or he can do one of two things:

(1) Go into the population and collect data on every single person to obtain the true population parameters.

(2) Take a random sample of the population that is representative of the population, and use the statistics to estimate those parameters.

For this reason, instead of dividing the Sum-of-Squares by “n” to obtain the variance, the “n-1” adjustment is used because it produces a result that more closely approximates the parameters in the population.

The scores vary from the mean on average by 2.625 (Standard deviation: = 2.625)

The Standard Deviation is a powerful statistical tool because it contains two important pieces of information about every score:

(a) The location of every score in the distribution in relation to the mean

(b) The location of that score in relation to every other score in the distribution.

The mean of this entire data-set (using scores of both males and females) was 7.909. Males have a mean of 7.667, a range of 10, a maximum score of 12, and a minimum score of 2. Females have a mean of 8.200, range of 4, a maximum score of 10, and a minimum score of 6. The descriptive statistics show us that the scores of men and women are different in many ways. Ultimately we want to know if the scores are different enough that we can say the “treatment” (gender) is responsible for the difference.

We look at any given raw score in the distribution, and think about how many standard deviations that score is from the mean. We always talk in terms of “number of standard deviations,” which includes whole standard deviations, or fractional parts of a standard deviation. For example, if the mean is 50, the standard deviation is 10, and our score is 60, we are “1” standard deviation above the mean. If our score is 65, we are “1.5” standard deviations above the mean. When we know the number of standard deviations a score is from the mean, we know where that score is in relation all other scores in the population.