Inferential Statistics

What is inferential statistics?

In research we seldom care only about the sample. Most often we want to know about the population as well. In inferential statistics, we analyze the sample data, but then we try to draw conclusions about the whole population.

When we select our samples from the population, any individual not selected is ignored in our sample. It means that we know absolutely nothing about these non-selected individuals. However, if the sample is random and large enough, such that it could be considered a good representative of the population, then we can make educated guesses about the rest of the population. In this sense, we are trying to "infer" something about the population from the sample, hence the name "inferential statistics".

Since we have not considered all the individuals in the population, inferential statistics is bound to produce errors, but we try to estimate or control the error in our conclusions.

Two types of inferential statistics

In inferential statistics we try to ask questions about the population. There are two common ways to ask such questions, as illustrated in the following examples:

Example 1: The sample mean is 10. What is the population mean?

One may think that the population mean would be 10 as well. But since we don't know everything about the population, we could only make a guess. In making a guess, it is advisable to guess a range of numbers (interval estimate) rather than just a single number (point estimate), so as to increase the probability of being correct. This probability is called the level of confidence, or just confidence for short. In practice, we set a certain value for this probability and estimate the range of numbers from there. The most popular value for the level of confidence is 95%.

As a result, the answer to the above question would look like the following:

"The population mean is between 8 and 12, and this is correct 95% of the time."

In this case, the interval 8 to 12 is called the 95% confidence interval of the population mean.

Note: The numbers 8 and 12 here are for illustrative purpose only. They could be other numbers calculated from the statistical procedures. The only requirement is that they are equidistant from 10, the sample mean in this case.

Other values of confidence could be used, depending on need. The lower the confidence needed, the smaller the range of the confidence interval. The answer will be less certain, but more precise. In contrast, the higher the confidence needed, the larger the range of the confidence interval. The answer will be more certain, but less precise. While we want to make the answer as precise as possible, we also want to control this level of confidence so that it is not lower than a certain value. Some common values used are: 80%, 85%, 90%, 95%, 99%.

Note that we never attempt to find the 100% confidence interval. Since inferential statistics is bound to produce errors, if we want to be 100% certain about the answer, then the interval must include every possible value from negative infinity to positive infinity (or from zero to positive infinity if only non-negative values are allowed). Such an answer is of course not helpful at all!

The confidence interval can be found as additional statistics in the corresponding hypothesis test output in SPSS and Jamovi. (See below to learn about hypothesis tests.)

Example 2: The sample mean is 10. Is the population mean equal to 10 as well?

In this case, instead of asking for the possible value(s) of the population mean, we suggest a possible value and ask whether this value is correct or not. This is a yes/no question and has only two possible answers: yes or no. We write them down as two hypotheses:

H0: The population mean is equal to 10. (This is called the null hypothesis and it represents the "Yes" answer to the yes/no question. It is denoted symbolically as H0.)

H1: The population mean is not equal to 10. (This is called the alternative hypothesis and it represents the "No" answer to the yes/no question. It is denoted symbolically as H1 or Ha.)

To see which answer is correct, we conduct hypothesis tests to test the hypotheses.

The most important result of the hypothesis tests is the p-value. Based on the p-value, we can conclude the answer to the yes/no question:

If p is not smaller than 0.05 (or other small values), then there is not enough evidence to reject H0. In that case, we take H0 as true. I.e., We answer "Yes" to the yes/no question.
If p is smaller than 0.05 (or other small values), then there is enough evidence to reject H0. In that case, we take H1 as true. I.e., We answer "No" to the yes/no question.

You can use the diagram on the right for a summary to help you make the decision.

The p-value is called type I error in statistics. It is the probability of making a mistake when we reject H0. For this reason, we want to reject H0 only when p is small enough. The threshold of p-value under which we would reject H0 is called level of significance, denoted by Greek alphabet α (alpha). A common threshold value used as level of significance is 0.05, which means that if we reject H0 on the basis that p<0.05, then the probability of making a mistake is less than 5%.

There are many different types of hypotheses that we could test, and what I give above is only an example. The hypotheses to write depend on our research questions, as well as the levels of measurement of the variables involved. The hypothesis test is then picked based on the hypotheses. Just as we don't start with a method of analysis in descriptive statistics and plug variables into the procedures in descriptive statistics, in inferential statistics we never start with a hypothesis test and plug variables into it. Instead, we start with the research questions, write down hypotheses that would help to answer the research questions, and then pick the correct hypothesis tests for that purpose.

You can consult your statistics textbook for the correct hypotheses and their tests, or you can see my descriptions and examples under the corresponding procedures in this website to help you determine what hypotheses and levels of measurements each hypothesis test is suitable for.

If you need more help, you can use the method selection tool at StatKat (https://statkat.com/statistical-technique-selection/tool-for-selecting-a-statistical-technique.php). If you use Jamovi you can also install the StatKat module, which will suggest the method based on the variables you select.

FAQs

1. Why can’t I use Ordinal Level in Non-Parametric Tests in SPSS?

Although Likert Scale is ordinal level, if you set it to ordinal level in SPSS then some non-parametric tests will not work. This is due to some programming consideration not to be detailed here. In short, if you need to conduct these non-parametric tests to your Likert scale data, you need to set the variables to scale level even though they are ordinal. You can change them back to ordinal after you finish the test.

Jamovi does not have this issue.

2. Why can’t I use ANOVA when the grouping variable is a string in SPSS?

The ANOVA procedure in SPSS requires a numeric grouping variable. You can recode your data using the suggestions given above.

Jamovi does not have this issue.

Google Sites

Report abuse