21st Math'fia

Managing Data.

Tabular & Graphical Representations of Data.

Use of Tables

Frequency tells you how often something happened. The frequency of an observation tells you the number of times the observation occurs in the data. For example, in the following list of numbers, the frequency of the number 9 is 5 (because it occurs 5 times):

1, 2, 3, 4, 6, 9, 9, 8, 5, 1, 1, 9, 9, 0, 6, 9.

A Frequency Distribution Table helps you in organizing large sets of quantitative data into smaller intervals or classes and counts how many data values fall into each class. In particular, a frequency distribution table answers two questions:

What values of the variables have been measured?
How often did each value occur?

Types of Graphs: Definitions, uses, and applications.

There are many types of graphs and charts, some are the following and their uses:

Bar Graph simply displays a bar (rectangle) for each category with the length of each bar representing the frequency of that category. They are used to compare visually different categories of data and to show trends over time. Pareto Chart is a bar graph ordered from highest to lowest frequency.

Pie Chart is a circle with wedges cut of varying sizes representing the relative frequencies of each category. They are best to use when trying to compare parts of a whole especially when there are not too many different categories.

Line Graphs are used to track changes over short and long periods of time. They can also be used to compare changes over the same period of time for more than one group.

Time Series Graphs can be constructed to represent data collected over a period of time. This type of graph is purposely used to show trends.

Secular Trends are usually drawn over a long period of time. Cyclical Trends refer to the business cycle where a business opportunity generates new companies or products that reap good profits. Seasonal Trends are graphs that show the value of a commodity for a short period of the year, say quarters in a year.

Descriptive Measures

The word statistics originated from the Latin word “status” meaning “state”. Statistics is the science that deals with the collection, classification, analysis, and interpretation of numerical data, in such a way that valid conclusions can be drawn from them.

Descriptive statistics involves the collection and classification of data. The analysis and interpretation of data constitute inferential statistics.

Measures of Central Tendency

The mean, median, and mode describe the “average” or “center” and collectively, they are called measures of central tendency.

Mean, Median, and Mode: Definitions and Sample Problems with Solutions

Mean is the most popular measure of central tendency. This is normally called “average”, but statisticians like to call it the “arithmetic mean.”

The mean of a set of measurements is the sum of a sample of measurements divided by their number of data points. So, if there n data points in a data set, and the data points are represented as x1, x2, x3, then the mean is computed as the:

Mean = (sum of x) / n

If the data are collected from the sample, the mean can be represented as the . If the data is collected from the population, the mean can be represented as μ, the lowercase Greek letter mu.

Example:

The mean of five measurements 2.5, 3.4, 4.2, 5.1, and 4.8 is

Mean = (2.5 + 3.4 + 4.2 + 5.1 + 4.8) / 5 = 4.0

Median of a set of data arranged according to size (ascending or descending) is the value of the middle data point if the number of data points is odd, and the mean of the two most middle data points if the number of data points is even. It is the value of the (n+1 / 2)th data point.

Example:

In an examination of 100 items, the raw scores of 9 students are 55, 46, 75, 90, 53, 46, 82, 74, and 69. What is the median score?

Solution: The median is not 53, the 5th score in the set because the data points must first be arranged according to size (lowest to highest). Thus, you will get

46 46 53 55 69 74 75 82 90

And it can be seen that the median is 69.

Mode is simply the value in a data set that occurs with the highest frequency and more than once. It is possible that in a data set, there is no mode, or more than one mode.

Example:

For the scores given above, 46 46 53 55 69 74 75 82 90, the mode is 46 since 46 appears twice and all other scores only appear once.

Measures of Dispersion

A measure of Dispersion indicates to what degree the individual observations are dispersed or spread out around their mean.

Range, Interquartile Range, Variance, and Standard Deviation: Definitions and Sample Problems with Solutions

Range. It is simply the difference between the highest and lowest observations in a set of data.

Interquartile Range (IQR) is the difference between the third and first quartiles. One half of the distribution lies within this range. It consists of the middle 50% of the observations in that it cuts off the lower 25% and the upper 25% of the data points.

Example:

Find the IQR of the scores of 9 students in an examination:

46 46 53 55 69 74 75 82 90

Find the first quartile: Q1 = ¼(9+1) = 2.5nd data point

Q1 = 46 + .5(53 - 46) = 46 + 3.5 = 49.5

Find the third quartile: Q3 = ¾(9+1) = 7.5th data point

Q3 = 75 + .5(82 - 75) = 75 + 2.5 = 77.5

Therefore, IQR = Q3 - Q1 = 77.5 - 49.5 = 28

Variance is the mean of the standard deviations from the mean. It means that you are finding the amount by which each observation deviates from the mean. Square those deviations and find the average of those squared deviations.

Given a sample of data points, the variance of the data, denoted by s^2, is:

S^2 = ∑(x-mean)^2 / n - 1.

Standard Deviation is simply the square root of the variance.

Measures of Relative Position

The measure of position is used to describe the position of a single data point relative to the others. Hence, the measures of relative standing can be used to compare values from different data sets, or to compare values within the data set.

Percentile, Quartile, Outliers, and Z-score: Definitions and Sample Problems with Solutions

Z-score is a measure of relative standing. The mean and the standard deviation of a set of data can be used to calculate the z-score. It is defined by the formula:

z = (x value - mean) / s

Where s is the standard deviation.

Example 1:

Consider the following heights of 10 basketball players in an inter-barangay summer tournament. What is the relative position of a basketball player whose height is 74 inches?

Solution: The mean is 71.4 inches and the standard deviation is 2.46 in. rounded off to 2 decimal places.

Calculating for the z-scores: z = (74 - 71.4)/ 2.46 = 1.06.

The height of a basketball player who is 74 inches is 1.06 standard deviations more than the average height of the basketball players.

Percentiles and Quartiles. A percentile is a measure used in statistics indicating the value below which a given percentage of observation in a group of observations falls.

The 25th percentile is also known as the first quartile (Q1).

The 50th percentile is also known as the second quartile (Q2).

The 75th percentile is also known as the third quartile (Q3).

Percentiles divide ordered data into hundredths.

To calculate percentiles and quartiles by hand, the following formulas are used:

For lower quartile, Q1: It is the value of x in position 0.25(n+1)
For the middle quartile, Q2: It is the value of x in position 0.50(n+1)
For upper quartile, Q3: It is the value of x in position 0.75(n+1)
Q1 and P25 are similar. So with Q2 = P50, and Q3 = P75.

Example:

The following measures represent the raw scores of 39 students in a 100-point examination in Introductory Statistics. The scores were arranged from lowest to highest

45 45 46 48 50 50 51 52 52 53 56 56 57 58 59 60 60 62 62 63 63 63 64 64 65 65 66 68 70 70 70 70 71 72 72 75 77 78 100

Solution: Find the score corresponding to the 25th percentile, 40th percentile, 65th percentile, and 95th percentile.

Calculating:

P25 = 0.25(39 + 1) = 10. The score corresponding to the 10th position is P25 = 53.
P40 = 0.4(39 + 1) = 16. The score corresponding to the 16th position is P40 = 60.
P65 = 0.65(39 + 1) = 26. The score corresponding to the 26th position is P65 = 65.
P95 = 0.95(39 + 1) = 38. The score corresponding to the 38th position is P95 = 78.

Outlier is a data point in the data set that appears very big or very small compared to the rest of the data points.

Using the IQR method, the following formulas can be used to find outliers:

Lower Boundary: Q1 - 1.5(IQR)

Upper Boundary: Q3 + 1.5(IQR)

Measures of Shape: Definition and Applications

The distribution of the data within a data set is described by measures of shape. The distribution can either be symmetrical or asymmetrical.

Symmetrical Distribution: Two sides of the distribution are identical to one another.

Asymmetrical Distribution: The two sides of a distribution do not mirror one another.

Symmetrical and Normal Distribution, and Skewness

Normal Distribution: is a true symmetrical distribution of observed values.

Skewness: is the tendency of values to be more frequent around the high or low ends of the horizontal axis.

Remember:

Positively skewed (or skewed to the right) means that most values tend to cluster on the left side of the horizontal axis in a distribution. In contrast, when the majority of values tend to cluster toward the right, a distribution is said to be negatively skewed (or skewed to the left).

In the event that a distribution is symmetric, the skewness value is 0. The data is positively skewed if skewness is positive. The data is negatively skewed if skewness is negative.

Measures of Correlation

Measure of correlation is a single number that describes the relationship between two variables.

Correlation:

Linear Correlation is an investigation into the degree of the relationship between two variables.

There are two types of linear correlation:

Positive or direct correlation: In general, this happens if y increases as x increases.

Negative or indirect correlation: In general, this happens if y decreases as x increases.

Coefficient of Correlation, denoted by r, is a numerical measure of the linear relationship between two variables. Correlation coefficients can lie between -1 and 1 inclusive. That is, -1 ≤ r ≤ 1.

By assuming a linear relationship between the variables x and y, the famous British statistician Carl Pearson, derived a formula for finding the correlation coefficient, r. In his honor, the Pearson Product Moment Correlation is formulated as follows:

Back to Chapter 2

Back to Home

Page updated

Report abuse