New: All the best for your Final Examination!!!!
Number of occurrence of a given type of event against each class is called the frequency of that class.
The frequency can also be defined as the number of members of a population or sample falling into a specific class.
The process of distributing observations into different classes is called frequency distribution and the table showing this distribution is called a frequency distribution table.
The first step in summarising data is to construct a frequency distribution.
It involves defining of two or more equivalence classes and counting the number of observations in each class
The way of tabulating a pool of data of a variable and their respective frequency side by side is called a ‘frequency distribution’ of those data
Croxton and Cowden defined “a statistical table which shows the sets of all distinct values of the variable arranged in order of magnitude, either individually or in groups, with their corresponding frequencies side by side”.
The distribution of a variable tells us what values it takes and how often it takes these values.
If the number of observations in the given data is very large, we cannot draw any inference
We are required to reduce the data by classifying the observations into a certain number of groups
Purpose behind the reduction of given data is to bring about and highlight some inherent properties
It reduces the mass of raw data into a more manageable form and provides a basis for its graphical presentation.
Statistics such as mean, variance and standard deviation etc. can be calculated
Simple because number decisions about the size and number of class intervals.
The equivalence classes of the variable become the class intervals For example, Blood Group
The cumulative frequency is not shown in this case because it is not meaningful for qualitative classification as we cannot consider any blood group as more important and arrange them in order.
The frequency distribution of a quantitative character necessitates an arbitrary or artificial classification of the data under study and grouping of the variate values into the most appropriate number of classes is a matter of judgement.
A frequency table is the table of categories and their respective number of observations in the sample.
In a frequency table, each category is said to be a class and the number of observations corresponding to a class is said to be its frequency.
Ungrouped frequency distribution, example: Aptitude scores of the students in biostatistics
•Ungrouped frequency distribution shows the frequency (f) of occurrence of each score
•The scores should be arranged from smallest to largest
•Each number between the smallest and the largest scores is recorded in the distribution so that every possible score can be checked and the gaps between scores easily detected.
From ungrouped frequency distribution, we lose some information eg. Who obtained the highest/lowest score?
Tabulation of raw data by dividing the whole range of observations into a number of classes and indicating the corresponding class-frequencies against the class intervals.
1. Find out the range of the data
Range = maximum value – minimum value
2. Taking into account the magnitude of range, the class interval (or class length) is determined such that the total number of classes is not less than 5 and not greater than 15.
If the number of observations in the data is ‘n’, we find the smallest positive integer ‘k’ such that 2k ≥ n. For example, if n = 200 then the integer k satisfying the relation 2k ≥ 200 is k = 8 ( 28 = 256 ). There is an alternative method of determining the number of classes by Sturge’s rule given by:
k = 1 + 3.3 log n
Where, k = number of classes and
n = total number of observations.
e.g. For n =100, k =1+3.3(2) = 7.6 ≈ 8.
But by this rule we get too many classes when ‘n’ is small and it gives too few classes when ‘n’ is large.
3. After deciding the number of classes, we select the class interval. It is desirable that the class-length or class interval for each class is the same (but it is not essential). If the class intervals of all the classes are equal, it is easy to compute certain descriptive measures while analysing the data.
1.Classes should not be overlapping (Mutually exclusive).
2.There should be no gaps between classes.
3.Classes should preferably be of the same size.
4.Open-ended classes should be avoided (like < 10 or > 100, etc.).
5.Class limits should be such that there is no ambiguity as to which class a particular item of data belongs to.
To obtain equal class intervals we divide range with the number of classes:
Class Interval= Range/(Number of Classes)
The mid-value or central value of each class of a frequency distribution is defined as the average of the class boundaries of that class.
The difference between the upper boundary point and lower boundary point of each class is called the class-interval or class-length.
Range = 79 – 1 = 78
Number of classes
The construction of a frequency table is accomplished by the following steps:
1. Construct a table with three columns. The first column shows what is being arranged in ascending order. It is the column of classes.
2. Go through the list of marks. The first mark in the list is 45, so put a tally mark against class 40 - 45 in the second column. The second mark is 64, so put a tally mark against the class 60 - 70 in the second column. The third mark in the list is 1, so put a tally mark against the class 00 - 10 in the second column of tally mark. Go on including all subsequent marks by putting tally marks against the classes in which the marks are located. When the fifth tally is reached for a mark, draw an oblique line through the first four tally marks. Go on doing so till the tally marks have been put for all the marks.
3. Count the number of tally marks for each mark and write it in the third column. The finished frequency distribution will take the form as shown in Table below,
Table is constructed by arranging collected data values in ascending order of magnitude with their corresponding frequencies.
Prepare a table with three columns:
Column of classes
Column of tally and
Frequency column
If the data is discrete (without classes), we will simply put values in an ascending order in the first column and will complete the table as described above.
1. Inclusive method : In an inclusive method of classification the upper limit of one class (154, 159, 164, …..) is not the lower limit of another higher class. Therefore, the observations equal to both lower and upper limits will be included in the particular class; e.g., observations 150 and 154 will be included in the class 150 – 154. This is true for all the classes. This method of forming class intervals is known as Inclusive method, This method is applicable for discrete data.
2. Exclusive method : In this method, the upper limit of a particular class is same as the lower limit of next higher class. For example,
150 – 155, 155 – 160, 160 – 165 …
The variable value equal to the upper limit of a class is grouped in the next higher class. Thus 155 is to be grouped in the class 155 – 160 and so on.
A running total of frequencies, sum of all previous frequencies up to the current point. The cumulative frequency for the first data point is the same as its frequency,
It shows the proportion frequency or percent frequency for each class interval
Relative frequency of a category is obtained by dividing the frequency of that category by the sum of all frequencies.
The percentage for a class is obtained by multiplying the relative frequency of that class by 100.
Relative frequencies indicate whether a frequency is “relatively large” rather than whether it is “absolutely large.