New: All the best for your Final Examination!!!!

Measures of Dispersion

Classification

Dispersion

Dispersion can be defined as the spread of observations in relation to a measure of central tendency. In other words it can be defined as degree of scatter of the individual observations about the mean.

It can be of two types,

Absolute measures

When the variation of a series is expressed in the unit of measurement of original observations, then the measure is called a measure of "Absolute Dispersion”. The absolute measures of dispersion can be compared with each other only if they belong to the same population.

These dispersion can be listed as,

Range
Inter Quartile Range
Quartile Deviation (Semi Inter Quartile Range)
Absolute Mean Deviation
Variance ^*
Standard Deviation
Standard Error

Relative Measures

If the measure of dispersion is in the form of percentage or ratio of the average, it is called a measure of "Relative Dispersion". A relative measure of dispersion is also called "Coefficient of Dispersion". This measure is suitable for comparison of two or more series expressed in different units of measurements.

These dispersion can be listed as,

Coefficient of Range
Coefficient of quartile deviation
Coefficient of Mean Deviation
Coefficient of Variation (Coefficient of Standard Deviation

Absolute measures of dispersion

Range

Difference between the largest (L) and the smallest (S) values of the variable in a series.

Range = L - S

In case of grouped data, the range is calculated as the difference between the upper limit of the highest class and the lower limit of the smallest class.

Merit of Range: The only merit of the range is that it is very easy to calculate and understand.

Demerits of Range

It is affected by sampling fluctuations.
It is influenced by the extreme values.
It is determined only by the extreme values and ignores the distribution of items in the center or close to mean.
The range of a symmetrical and asymmetrical distribution can be identical if the extreme values of both are same.

Uses of Range : The range has very limited usefulness. It is used in certain industrial works in the field of quality control to maintain the quality of manufactured goods.

Inter Quartile Range

Quartiles are those values that divide the arrayed observations in a series into four equal parts

|---------------|---------------|---------------|--------------|

Q₁ Q₂ Q₃

Inter Quartile Range = (Q₃ - Q₁)

It is better than range as it is not affected by the values of extreme items.

In fact, 50 percent of the given observations lie between the two quartiles.

Semi interquartile range or Quartile Deviation

Quartile Deviation= (Q₃ - Q₁)/2

For a symmetrical distribution, the value of quartile deviation when added to Q₁, or when subtracted from Q₃, is equal to the median. i.e.,

(Median- Q₁) (Q₃ - Median) the median ± quartile deviation includes exactly 50 percent of the total observations.

Absolute Mean Deviation

The absolute mean deviation is defined as the arithmetic mean of the absolute deviations (ignoring signs) of various items from a measure of central tendency (either mean, median or mode).

The arithmetic mean is, however, more commonly used in calculating the value of the mean deviation. Therefore it is more frequently called the “mean deviation”.

It is also referred to as the “first moment of dispersion”.

Computation

Σ | X –`X |

AMD = -------------- for raw data, and

Σ f | X –`X |

AMD = ---------------- for grouped data

Σf

Σf | m –`X |

AMD = ----------------

Σf

Merits of Absolute Mean Deviation

Rigidly defined and has a definite value.
Based on all observations.
Less affected by extreme values.
Flexible because it can be calculated from any measure of central tendency.
Simple to calculate.

Demerits of Absolute Mean Deviation

It is not capable of further algebraic treatment.
It can not be calculated in open-end classes.
It is not very accurate measure of dispersion.

Variance

Also known as Mean Square Deviation or Mean Square

Variance is defined as the mean of squares of deviations of individual observations from their arithmetic mean.

The variance is second moment of dispersion.

The variance is expressed in the squared unit of measurement, e.g., kg², cm², lit² etc.

It is represented by symbol 's² ' for sample and 'σ² ' for population.

Methods of computation

1. Deviation Square Method

2. Variable square method

Degree of freedom

The term degree of freedom refers to the number of independent measurements that are available for estimation of a parameter.

It is defined as the number of items whose values can be determined at will.

It is also defined as the total number of items in a series minus the number of restrictions or constraints put in the data.

We divide the sum of squares of deviations by (n – 1) to get an "unbiased" estimate of population parameter from the sample estimate.

As the sum of the deviations of individual observations from their mean is to be zero, the value of one of the n deviations is fixed.

Here once (n – 1) deviations are added, the remaining deviation must automatically take the value (plus or minus) required to make

This fixed deviation is not free to take any random value and hence in the statistical sense we lose one degree of freedom. Thus degree of freedom is n-1

Standard Deviation (Root Mean Square Deviation)

It is defined as the positive square root of the arithmetic mean of the squares of deviations of individual observations from their arithmetic mean.

We represent the standard deviation from a sample by s and that from a population by the Greek letter σ .

It is expressed in the same unit in which the original observations are measured.

Properties of Variance / Standard Deviation

Addition (or subtraction) of a constant to (or from) all the observations does not change variance or standard deviation

(Unaffected by change of origin)

2. If we multiply (or divide) each observation by a constant, the variance will be multiplied (or divided) by the square of that constant and standard deviation will be multiplied (or divided) by that constant itself.

(Affected by change of scale).

3. The variance is always a positive value. It ranges from zero to infinity.

Variance can never be negative.

variance of a variable in a data set is 0 if and only if all entries have the same value.

4. The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances.

Var(X + Y) = Var (X) + Var (Y)

5. The proportion of area under a normal distribution (symmetrical curve), in relation to σ is as follows:

μ ± σ includes 68.26 % area

μ ± 2σ includes 95.45 % area

μ ± 3σ includes 99.73 % area

Merits of Variance/Standard Deviation

It is rigidly defined.
It is based on all the observations.
It is less affected by sampling fluctuations.
It is capable of mathematical treatment.
The standard deviation is the best and most commonly used measure of dispersion.

Demerits of Variance/Standard Deviation

It is difficult to calculate as compared to other measures of dispersion.
It gives more weight to extreme values and less to the values that are near to the mean.

Uses of Variance/Standard Deviation

To compare the variation of two or more series expressed in the same units.
To judge the efficiency or precision of the sample estimate (mean), e.g., if two sample means are same, the one with smaller standard deviation is more precise or more representative.
In normal distribution, the standard deviation helps us in finding the number of items that fall within the specific ranges.

Standard Error (Standard Deviation of Mean)

The standard deviation refers to the deviation of individual observations from the mean. Thus it applies to the observations.

The standard error conventionally implies the standard error of the mean, i.e., it applies to means.

Standard error is the standard deviation of all possible sample means from the population mean.

It refers to the standard deviation of sampling distribution of an estimate.

Standard error of mean is generally computed from the estimate of a single sample.

Standard error is inversely proportional to the square root of the number of observations in the sample.

The standard error decreases as the sample size increases.

Uses of Standard Error

It is used to know the reliability of a sample estimate. Smaller the standard error, more reliable is the estimate.
It is also used for testing the significance of difference between two sample means.

Relative Measures of Dispersion

1. Coefficient of range

2. Coefficient of quartile deviation

3. Coefficient of mean deviation

4. Coefficient of Variation

The relative measure of standard deviation is called the coefficient of variation or the coefficient of standard deviation. The coefficient of variation is defined as the ratio of standard deviation to the mean. It is generally expressed in percentage.

Uses of Coefficient of Variation

The coefficient of variation is the best measure for comparing variability or consistency of two or more samples.
The coefficient of variation is used when we want to compare the variation of two or more series or groups independent of the magnitude of their means.

Measures of relative position

These measures are used to describe the position of a specific data value in relation to the rest of the data arranged in order.

1. Z-scores

A Z-score (or standardised value) is found by converting a value to a standardised scale.

Thus, a z score is the number of standard deviations that a data value is away from the mean.

2. Quartiles

Quartiles divide the distribution in four parts and there are three quartiles.
The 2nd quartile divides the distribution into 2 halves and therefore is same as median.
In order to calculate quartiles, first the data must be arranged in an ascending order.

3. Deciles

Divide a given set of data into 10 equal parts.
Thus, there are 9 deciles for a series.

4. Percentiles

The percentiles are sometimes also called quantiles

These are measures of location which divide a set of data into 100 equal parts or groups with about 1% of the values in each group.

Each set of data has 99 percentiles as shown in Figure below

•The k^th percentile is a value such that at most k% of the data are smaller in value than P_k and at most (100–k) % of the data is larger than it

Measures of the shape of distribution

1. Skewness

It refers to the lack of symmetry in the curve or distribution. There is skewness in the curve if:

Curve is not symmetrical
Mean, median and mode are different
Sum of positive and negative deviations from the median are not equal
Distances from median to quartiles are unequal

Symmetrical distribution –

Curve is symmetrical

mean = median = mode

Positively skewed distribution -

A distribution in which the values of frequency increase suddenly but falls slowly.

The tail of the curve is drawn more towards the right side, hence it is also called rightly skewed curve.

In a positively skewed curve -

Mean> Median >mode

Coefficient of skewness > 0 (Positive)

Negatively skewed distribution -

In the negatively skewed distribution the value of frequency increases slowly and falls rapidly.

The tail of the curve is drawn more towards left side, hence it is also referred to as a curve skewed to the left.

In a negatively skewed curve -

Mean< Median <mode

Coefficient of skewness < 0 (Negative)

Measures of Skewness

(1) Absolute measures

(i) Skewness = Mean – Median

(ii) Skewness = Mean – Mode

(iii) Skewness = ( Q₃ – Mode) – (Mode – Q₁)

(2) Relative measures

Mean - Mode

Coefficient of skewness = ------------------------

Standard deviation

The limits of this measure are – 1 and + 1.

If mode is ill defined then we use:

3 ( Mean – Median)

Coefficient of skewness = ---------------------------

Standard deviation

The limits of this measure are – 3 and + 3.

Both the above measures of skewness were given by Karl Pearson, hence also called Karl Pearson’s Coefficient of skewness.

Kurtosis

Kurtosis is a Greek word meaning bulginess.

In statistics it refers to the degree of flatness or peakedness of a curve in the region about the mode.

The degree of kurtosis is measured relative to the peakedness of normal curve.

Mesokurtic Curve – A normal curve is said to be mosokurtic.

Leptokurtic Curve – A curve that is more peaked than the normal curve. It is slender and narrow.

The items are more near the mean and tails than at intermediate region.

Coefficient of kurtosis > 3

Platykurtic Curve – A curve that is less peaked and more flat at the top than the normal curve.It is wide and flat.

There are fewer items at the mean and at the tails than normal curve but more items at intermediate region.

Coefficient of kurtosis < 3

Google Sites

Report abuse