Type I & Type II Errors

Type I and Type II Errors

What Can Go Wrong: Two Types of Errors

Statistical investigations involve making decisions in the face of uncertainty, so there is always some chance of making a wrong decision. When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H0 and the decision to reject or not. In hypothesis testing, there are two types of correct decisions, and two types of wrong decisions that can occur.

If the null hypothesis is true, but we reject it, the error is a type I error.

If the null hypothesis is false, but we fail to reject it, the error is a type II error.

The following table summarizes type I and II errors.


Comment

Type I and type II errors are not caused by mistakes. These errors are the result of random chance. The data provide evidence for a conclusion that is false. It’s no one’s fault!

The four possible outcomes in the table are: The decision is not to reject H0 when H0 is true (correct decision). The decision is to reject H0 when H0 is true (incorrect decision known as a Type I error). The decision is not to reject H0 when, in fact, H0 is false (incorrect decision known as a Type II error). The decision is to reject H0 when H0 is false (correct decision whose probability is called the Power of the Test).

Each of the errors occurs with a particular probability. The Greek letters

α and β represent the probabilities.

α = probability of a Type I error = P(Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a Type II error = P(Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

α and β should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is 1 –β. Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test.

__________________________________________

EXAMPLE 1

Suppose the null hypothesis, H0, is: Frank’s rock climbing equipment is safe.

  • Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.

  • Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

__________________________________________

EXAMPLE 2

Suppose the null hypothesis, H0, is: the blood cultures contain no traces of pathogen X. The following are the Type I and Type II errors.

  • Type I error: The researcher thinks the blood cultures do contain traces of pathogen X, when in fact, they do not.

  • Type II error: The researcher thinks the blood cultures do not contain traces of pathogen X, when in fact, they do.

__________________________________________

EXAMPLE 3

Suppose the null hypothesis, H0, is: The victim of an automobile accident is alive when he arrives at the emergency room of a hospital.

  • Type I error: The emergency crew thinks that the victim is dead when, in fact, the victim is alive.

  • Type II error: The emergency crew does not know if the victim is alive when, in fact, the victim is dead.

α = probability that the emergency crew thinks the victim is dead when, in fact, he is really alive = P(Type I error). β = probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead =P(Type II error).

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

__________________________________________

PRACTICE 1

Suppose the null hypothesis, H0, is: a patient is not sick. Define each type of error and the associated consequence for Type I and Type II error?

__________________________________________

PRACTICE 2

“Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state a consequence of each.

__________________________________________

PRACTICE 3

A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer. Describe both the Type I and Type II errors in context. Which error is the more serious?

__________________________________________

PRACTICE 4

Assume a null hypothesis, H0, that states the percentage of adults with jobs is at least 88%.

Identify the Type I and Type II errors from these four statements.

a)Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%

b)Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.

c)Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.

d)Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%.

__________________________________________

EXAMPLE 4

Researchers investigated the claim that the mean data usage for all teens is greater than 62 MBs. The sample mean was 75 MBs. The P-value was approximately 0.023. In this situation, the P-value is the probability that we will get a sample mean of 75 MBs or higher if the true mean is 62 MBs.

Notice that the result (75 MBs) isn’t impossible, only very unusual. The result is rare enough that we question whether the null hypothesis is true. This is why we reject the null hypothesis. But it is possible that the null hypothesis is true and the researcher happened to get a very unusual sample mean. In this case, the result is just due to chance, and the data have led to a type I error: rejecting the null hypothesis when it is actually true.

__________________________________________

EXAMPLE 5

Researchers conducted a hypothesis test using poll results to determine if white male support for Obama in 2012 will be less than 40%. Our poll of white males showed 35% planning to vote for Obama in 2012. Based on the sampling distribution, we estimated the P-value as 0.078. In this situation, the P-value is the probability that we will get a sample proportion of 0.35 or less if 0.40 of the population of white males support Obama.

At the 5% level, the poll did not give strong enough evidence for us to conclude that less than 40% of white males will vote for Obama in 2012, so we did not reject the null hypothesis.

Which type of error is possible in this situation? If, in fact, it is true that less than 40% of this population support Obama, then the data led to a type II error: failing to reject a null hypothesis that is false. In other words, we failed to support an alternative hypothesis that is true.

We definitely did not make a type I error here because a type I error requires that we reject the null hypothesis!

__________________________________________

EXAMPLE 6

The following is an excerpt from a 1999 New York Times article titled “Cell phones: questions but no answers,” as referenced by David S. Moore in Basic Practice of Statistics (4th ed., New York: W. H. Freeman, 2007):

  • A hospital study that compared brain cancer patients and a similar group without brain cancer found no statistically significant association between cell phone use and a group of brain cancers known as gliomas. But when 20 types of glioma were considered separately, an association was found between cell phone use and one rare form. Puzzlingly, however, this risk appeared to decrease rather than increase with greater mobile phone use.

This is an example of a probable type I error. Suppose we conducted 20 hypotheses tests with the null hypothesis “Cell phone use is not associated with cancer” at the 5% level. We expect 1 in 20 (5%) to give significant results by chance alone when there is no association between cell phone use and cancer. So the conclusion that this one type of cancer is related to cell phone use is probably just a result of random chance and not an indication of an association.

Click here to see a fun cartoon that illustrates this same idea.

__________________________________________

Concept Review

In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when a false null hypothesis is not rejected.

The probabilities of these errors are denoted by the Greek letters α and β, for a Type I and a Type II error respectively. The power of the test, 1 – β, quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis being accepted. A high power is desirable.

Formula Review

α = probability of a Type I error = P(Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a Type II error = P(Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

What Is the Probability That We Will Make a Type I Error?

If the significance level is 5% (α = 0.05), then 5% of the time we will reject the null hypothesis (when it is true!). Of course we will not know if the null is true. But if it is, the natural variability that we expect in random samples will produce rare results 5% of the time.

Similarly, if the significance level is 1%, then 1% of the time sample results will be rare enough for us to reject the null hypothesis. So if the null hypothesis is actually true, then by chance alone, 1% of the time we will reject a true null hypothesis. The probability of a type I error is therefore 1%.

In general, the probability of a type I error is α.

What Is the Probability That We Will Make a Type II Error?

The probability of a type I error, if the null hypothesis is true, is equal to the significance level. The probability of a type II error is much more complicated to calculate. We can reduce the risk of a type I error by using a lower significance level. The best way to reduce the risk of a type II error is by increasing the sample size. In theory, we could also increase the significance level, but doing so would increase the likelihood of a type I error at the same time. We discuss these ideas further in a later module.

Comment

In general, if the null hypothesis is true, the significance level gives the probability of making a type I error. If we conduct a large number of hypothesis tests using the same null hypothesis, then, a type I error will occur in a predictable percentage (α) of the hypothesis tests. This is a problem! If we run one hypothesis test and the data is significant at the 5% level, we have reasonably good evidence that the alternative hypothesis is true. If we run 20 hypothesis tests and the data in one of the tests is significant at the 5% level, it doesn’t tell us anything! We expect 5% of the tests (1 in 20) to show significant results just due to chance.

We close our introduction to hypothesis testing with a helpful analogy.

A Courtroom Analogy for Hypothesis Tests

When a defendant stands trial for a crime, he or she is innocent until proven guilty. It is the job of the prosecution to present evidence showing that the defendant is guilty beyond a reasonable doubt. It is the job of the defense to challenge this evidence to establish a reasonable doubt. The jury weighs the evidence and makes a decision.

When a jury makes a decision, it has only two possible verdicts:

  • Guilty: The jury concludes that there is enough evidence to convict the defendant. The evidence is so strong that there is not a reasonable doubt that the defendant is guilty.

  • Not Guilty: The jury concludes that there is not enough evidence to conclude beyond a reasonable doubt that the person is guilty. Notice that they do not conclude that the person is innocent. This verdict says only that there is not enough evidence to return a guilty verdict.

How is this example like a hypothesis test?

The null hypothesis is “The person is innocent.” The alternative hypothesis is “The person is guilty.” The evidence is the data. In a courtroom, the person is assumed innocent until proven guilty. In a hypothesis test, we assume the null hypothesis is true until the data proves otherwise.

The two possible verdicts are similar to the two conclusions that are possible in a hypothesis test.

Reject the null hypothesis: When we reject a null hypothesis, we accept the alternative hypothesis. This is like a guilty verdict. The evidence is strong enough for the jury to reject the assumption of innocence. In a hypothesis test, the data is strong enough for us to reject the assumption that the null hypothesis is true.

Fail to reject the null hypothesis: When we fail to reject the null hypothesis, we are delivering a “not guilty” verdict. The jury concludes that the evidence is not strong enough to reject the assumption of innocence, so the evidence is too weak to support a guilty verdict. We conclude the data is not strong enough to reject the null hypothesis, so the data is too weak to accept the alternative hypothesis.

How does the courtroom analogy relate to type I and type II errors?

Type I error: The jury convicts an innocent person. By analogy, we reject a true null hypothesis and accept a false alternative hypothesis.

Type II error: The jury says a person is not guilty when he or she really is. By analogy, we fail to reject a null hypothesis that is false. In other words, we do not accept an alternative hypothesis when it is really true.

Let’s Summarize

In this section, we introduced the five-step process of hypothesis testing:

Step 1: Determine the hypotheses.

  • The hypotheses are claims about the population(s) that are contradictory, mutually exclusive, and exhaustive.

  • The null hypothesis is a hypothesis that the parameter equals a specific value (=, ≤, or ≥).

  • The alternative hypothesis is the competing claim that the parameter is less than (<), greater than (>), or not equal to (≠) the parameter value in the null. The claim that drives the statistical investigation is usually found in the alternative hypothesis.

Step 2: State your decision criterion (α).

Because the hypothesis test is based on probability, we need to state the level of acceptable type I error. This is usually set to 5% by tradition. This stated decision criterion is what we compare our test statistic (p-value) to in order to make a decision to reject or fail to reject the null hypothesis (step 5).

Step 3: Collect the data.

Because the hypothesis test is based on probability, random selection or assignment is essential in data production. We must ethically collect our data for analysis (step 4) and interpretation (step 5).

Step 4: Assess the evidence by computing statistics.

  • Use the data to find a P-value.

  • The P-value is a probability statement about how unlikely the data are if the null hypothesis is true (p(data|null hypothesis).

  • More specifically, the P-value gives the probability of sample results at least as extreme as the data if the null hypothesis is true.

Step 5: Give the conclusion.

  • A small P- says the data is unlikely to occur if the null hypothesis is true. We therefore conclude that the null hypothesis is probably not true and that the alternative hypothesis is true instead.

  • Precisely, if the p-value is less than the stated decision criterion (α) then we will reject the null hypothesis and conclude that we find support for the alternative hypothesis.

  • If the P-value is greater than the stated decision criterion (α), then we say we fail to reject the null hypothesis. We never say that we “accept” the null hypothesis. We just say that we don’t have enough evidence to reject it. This is equivalent to saying we don’t have enough evidence to support the alternative hypothesis.

  • Our conclusion will respond to the research question, so we often state the conclusion in terms of the alternative hypothesis.

Inference is based on probability, so there is always uncertainty. Although we may have strong evidence against it, the null hypothesis may still be true. If this is the case, we have a type I error. Similarly, even if we fail to reject the null hypothesis, it does not mean the alternative hypothesis is false. In this case, we have a type II error. These errors are not the result of a mistake in conducting the hypothesis test. They occur because of random chance.


References:

References:

  1. https://courses.lumenlearning.com/introstats1/chapter/outcomes-and-the-type-i-and-type-ii-errors

CC LICENSED CONTENT, SHARED PREVIOUSLY

ALL RIGHTS RESERVED CONTENT

  • Type 1 errors | Inferential statistics | Probability and Statistics | Khan Academy. Authored by: Khan Academy. Located at: https://youtu.be/EowIec7Y8HM. License: All Rights Reserved. License Terms: Standard YouTube License


  1. https://courses.lumenlearning.com/wmopen-concepts-statistics/chapter/introduction-to-hypothesis-testing-5-of-5/

CC LICENSED CONTENT, SHARED PREVIOUSLY

Answers

Practice 1.

Type I error would occur if the patient will be thought sick when, in fact, they are well, so they will get unnecessary, and possibly harmful, treatment that they do not need.

Type II error would occur if the patient will be thought well when, in fact, they are sick, so they will not get treatment.


Practice 2.

In this scenario, an appropriate null hypothesis would be H0: the mean level of toxins is at most 800 μg, H0 : μ0 ≤ 800 μg.

  • Type I error: The DMF believes that toxin levels are still too high when, in fact, toxin levels are at most 800 μg. The DMF continues the harvesting ban, resulting in lost wages and income.

  • Type II error: The DMF believes that toxin levels are within acceptable levels (are at least 800 μg) when, in fact, toxin levels are still too high (more than 800 μg). The DMF lifts the harvesting ban. This error could be the most serious. If the ban is lifted and clams are still toxic, consumers could possibly eat tainted food.

Practice 3.

  • Type I: A cancer patient believes the cure rate for the drug is less than 75% when it actually is at least 75%.

  • Type II: A cancer patient believes the experimental drug has at least a 75% cure rate when it has a cure rate that is less than 75%.

In this scenario, these errors have serious consequences. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Practice 4.

Type I error: c

Type I error: b