New: All the best for your Final Examination!!!!
Dispersion can be defined as the spread of observations in relation to a measure of central tendency. In other words it can be defined as degree of scatter of the individual observations about the mean.
It can be of two types,
When the variation of a series is expressed in the unit of measurement of original observations, then the measure is called a measure of "Absolute Dispersion”. The absolute measures of dispersion can be compared with each other only if they belong to the same population.
These dispersion can be listed as,
If the measure of dispersion is in the form of percentage or ratio of the average, it is called a measure of "Relative Dispersion". A relative measure of dispersion is also called "Coefficient of Dispersion". This measure is suitable for comparison of two or more series expressed in different units of measurements.
These dispersion can be listed as,
Absolute measures of dispersion
Difference between the largest (L) and the smallest (S) values of the variable in a series.
Range = L - S
In case of grouped data, the range is calculated as the difference between the upper limit of the highest class and the lower limit of the smallest class.
Merit of Range: The only merit of the range is that it is very easy to calculate and understand.
Demerits of Range
Uses of Range : The range has very limited usefulness. It is used in certain industrial works in the field of quality control to maintain the quality of manufactured goods.
Quartiles are those values that divide the arrayed observations in a series into four equal parts
|---------------|---------------|---------------|--------------|
Q1 Q2 Q3
Inter Quartile Range = (Q3 - Q1)
It is better than range as it is not affected by the values of extreme items.
In fact, 50 percent of the given observations lie between the two quartiles.
Quartile Deviation= (Q3 - Q1)/2
For a symmetrical distribution, the value of quartile deviation when added to Q1, or when subtracted from Q3, is equal to the median. i.e.,
(Median- Q1) (Q3 - Median) the median ± quartile deviation includes exactly 50 percent of the total observations.
The absolute mean deviation is defined as the arithmetic mean of the absolute deviations (ignoring signs) of various items from a measure of central tendency (either mean, median or mode).
The arithmetic mean is, however, more commonly used in calculating the value of the mean deviation. Therefore it is more frequently called the “mean deviation”.
It is also referred to as the “first moment of dispersion”.
Computation
Σ | X –`X |
AMD = -------------- for raw data, and
n
Σ f | X –`X |
AMD = ---------------- for grouped data
Σf
Σf | m –`X |
AMD = ----------------
Σf
Merits of Absolute Mean Deviation
Demerits of Absolute Mean Deviation
Also known as Mean Square Deviation or Mean Square
Variance is defined as the mean of squares of deviations of individual observations from their arithmetic mean.
The variance is second moment of dispersion.
The variance is expressed in the squared unit of measurement, e.g., kg2, cm2, lit2 etc.
It is represented by symbol 's2 ' for sample and 'σ2 ' for population.
Methods of computation
1. Deviation Square Method
2. Variable square method
The term degree of freedom refers to the number of independent measurements that are available for estimation of a parameter.
It is defined as the number of items whose values can be determined at will.
It is also defined as the total number of items in a series minus the number of restrictions or constraints put in the data.
We divide the sum of squares of deviations by (n – 1) to get an "unbiased" estimate of population parameter from the sample estimate.
As the sum of the deviations of individual observations from their mean is to be zero, the value of one of the n deviations is fixed.
Here once (n – 1) deviations are added, the remaining deviation must automatically take the value (plus or minus) required to make
This fixed deviation is not free to take any random value and hence in the statistical sense we lose one degree of freedom. Thus degree of freedom is n-1
It is defined as the positive square root of the arithmetic mean of the squares of deviations of individual observations from their arithmetic mean.
We represent the standard deviation from a sample by s and that from a population by the Greek letter σ .
It is expressed in the same unit in which the original observations are measured.
Properties of Variance / Standard Deviation
(Unaffected by change of origin)
2. If we multiply (or divide) each observation by a constant, the variance will be multiplied (or divided) by the square of that constant and standard deviation will be multiplied (or divided) by that constant itself.
(Affected by change of scale).
3. The variance is always a positive value. It ranges from zero to infinity.
Variance can never be negative.
variance of a variable in a data set is 0 if and only if all entries have the same value.
4. The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances.
Var(X + Y) = Var (X) + Var (Y)
5. The proportion of area under a normal distribution (symmetrical curve), in relation to σ is as follows:
μ ± σ includes 68.26 % area
μ ± 2σ includes 95.45 % area
μ ± 3σ includes 99.73 % area
Merits of Variance/Standard Deviation
Demerits of Variance/Standard Deviation
Uses of Variance/Standard Deviation
The standard deviation refers to the deviation of individual observations from the mean. Thus it applies to the observations.
The standard error conventionally implies the standard error of the mean, i.e., it applies to means.
Standard error is the standard deviation of all possible sample means from the population mean.
It refers to the standard deviation of sampling distribution of an estimate.
Standard error of mean is generally computed from the estimate of a single sample.
Standard error is inversely proportional to the square root of the number of observations in the sample.
The standard error decreases as the sample size increases.
Uses of Standard Error
Relative Measures of Dispersion
1. Coefficient of range
2. Coefficient of quartile deviation
3. Coefficient of mean deviation
4. Coefficient of Variation
The relative measure of standard deviation is called the coefficient of variation or the coefficient of standard deviation. The coefficient of variation is defined as the ratio of standard deviation to the mean. It is generally expressed in percentage.
Uses of Coefficient of Variation
Measures of relative position
These measures are used to describe the position of a specific data value in relation to the rest of the data arranged in order.
A Z-score (or standardised value) is found by converting a value to a standardised scale.
Thus, a z score is the number of standard deviations that a data value is away from the mean.
The percentiles are sometimes also called quantiles
These are measures of location which divide a set of data into 100 equal parts or groups with about 1% of the values in each group.
Each set of data has 99 percentiles as shown in Figure below
•The kth percentile is a value such that at most k% of the data are smaller in value than Pk and at most (100–k) % of the data is larger than it
Measures of the shape of distribution
It refers to the lack of symmetry in the curve or distribution. There is skewness in the curve if:
Symmetrical distribution –
Curve is symmetrical
mean = median = mode
Positively skewed distribution -
A distribution in which the values of frequency increase suddenly but falls slowly.
The tail of the curve is drawn more towards the right side, hence it is also called rightly skewed curve.
In a positively skewed curve -
Mean> Median >mode
Coefficient of skewness > 0 (Positive)
Negatively skewed distribution -
In the negatively skewed distribution the value of frequency increases slowly and falls rapidly.
The tail of the curve is drawn more towards left side, hence it is also referred to as a curve skewed to the left.
In a negatively skewed curve -
Mean< Median <mode
Coefficient of skewness < 0 (Negative)
Measures of Skewness
(1) Absolute measures
(i) Skewness = Mean – Median
(ii) Skewness = Mean – Mode
(iii) Skewness = ( Q3 – Mode) – (Mode – Q1)
(2) Relative measures
Mean - Mode
Coefficient of skewness = ------------------------
Standard deviation
The limits of this measure are – 1 and + 1.
If mode is ill defined then we use:
3 ( Mean – Median)
Coefficient of skewness = ---------------------------
Standard deviation
The limits of this measure are – 3 and + 3.
Both the above measures of skewness were given by Karl Pearson, hence also called Karl Pearson’s Coefficient of skewness.
Kurtosis is a Greek word meaning bulginess.
In statistics it refers to the degree of flatness or peakedness of a curve in the region about the mode.
The degree of kurtosis is measured relative to the peakedness of normal curve.
Mesokurtic Curve – A normal curve is said to be mosokurtic.
Leptokurtic Curve – A curve that is more peaked than the normal curve. It is slender and narrow.
The items are more near the mean and tails than at intermediate region.
Coefficient of kurtosis > 3
Platykurtic Curve – A curve that is less peaked and more flat at the top than the normal curve.It is wide and flat.
There are fewer items at the mean and at the tails than normal curve but more items at intermediate region.
Coefficient of kurtosis < 3