Statistics
Average and mean are similar yet are different. The term average is the sum of all the numbers divided by the total number of values in the set. The term mean is finding of the average of a sample data. Average is finding the central value in math, whereas mean is finding the central value in statistics. We use average when the difference between the values is less, whereas, for the set of values that have more difference, we prefer finding the mean of the data.
By definition, an average is the arithmetic mean of the sum of all the values divided by the total number of values in a given set. An average is calculated for those sets of value which are more or less same. The term average describes the numeric value that represents a large amount of data. An average can be derived by calculating the ratio of the sum of all the values to the number of units or values.
Probability Sampling Methods
Probability sampling methods use random sampling of individuals from a population. The goal of probability sampling is to reduce bias in the sample. There are several types of probability sampling methods:
Simple Random Sampling (SRS): This is the most basic probability sampling method. Every group of individuals has the same chance of being selected as every other group of the same size. To get a simple random sample, all individuals in a population are identified. Then a random sample of a given size is selected.
Cluster sampling: Entire groups of individuals are randomly selected, and all members of each selected group participate in the study. For instance, all of the 8th grade classes in County School District are identified. Then a random sample of entire 8th grade classrooms is selected; when an 8th grade classroom is selected, all students in the classroom are part of the sample.
Stratified sampling: The population is divided into groups called strata, before selecting study participants at random from within those groups (or strata). This method is commonly used in political polling. For instance, voters in a given district are divided into age groups: 18-21, 22-25, 26-30, and so on. Then a sample is created by randomly selecting voters from each age group.
Systematic sampling: All members of a population are placed in a list and a starting point is randomly selected, and then each observation at a given interval (such as every 5th observation) is selected. For instance, suppose a factory manager wants to do quality control on its product. The manager would randomly select a starting point for selecting an item off the assembly line and would then select an item at a given interval (such as every 100th item) for quality control assessment.
Non-Probability Sampling Methods
Non-probability sampling methods do not use random sampling. Thus, it is possible that the sample does not represent the population of interest and/or there is bias. There are several types of non-probability sampling methods:
Volunteer Sample: This type of sample occurs when individuals select themselves for the study. The volunteer method often results in individuals who are different in an important way from the individuals who did not volunteer. An example volunteer sample is individuals who write product reviews online. The individuals who choose to write reviews may be different from others, who did not write reviews, in that they really liked or disliked a product.
Convenience Sample: This type of sample results in individuals who were in the right place at the right time to suit the researcher. It is possible that it may be different from the general population in a subtle but important way. However, for certain variables of interest, a convenience sample may still be fairly representative.
An outlier is a data value that is much smaller or larger than most of the other values.
It is a value that is "unusual in size" and is very different from most of the other values.
Choosing the best measure to describe data:
Data set with an outlier : Median
Data set without an outlier : Mean
The range is a measure of how much the data is spread out.
range = greatest number - least number
A small range indicates low variability in the data.
A large range indicates high variability in the data.