# MODUS TOLLENS

### The syllogism modus tollens can be used to reject hypotheses.

*Modus tollens* is a valid deductive syllogism that takes the form:

__PREMISE__: If A then B.

__PREMISE__: B is NOT true.

__CONCLUSION__: Therefore, A is NOT true.

How can we use *modus tollens* to test hypotheses?

For our first premise, we could imagine making a specific prediction based on a General Hypothesis. For example, we could predict:

__PREMISE 1__: IF non-repetitive study results in more learning than blocked study of mathematics skills,

THEN serial study (one type of non-repetitive study) will result in significantly higher scores on algebra, geometry, and word problem tests than blocked study during retention tests.

The first part of the premise is very general, implying that __all types__ of non-repetitive study result in more learning than blocked study of mathematics skills. Therefore, the first part of the premise could be thought of as a General Hypothesis. The second part of the premise is a specific prediction (out of MANY possible). The second part of the premise is therefore one possible Measurable Hypothesis.

We could then perform an experiment, collect data, perform statistical tests, and find:

__PREMISE 2__: Retention test scores on algebra, geometry, and word problem tests were NOT higher than blocked study during retention tests (t-tests; P > 0.05).

Using *modus tollens*, we could come to the conclusion:

__CONCLUSION__: Serial study does NOT result in more learning than blocked study of mathematics skills. Non-repetitive study does not result in more learning than blocked study of mathematics skills.

Is the argument a __valid__ deductive argument and a form of *modus tollens*?

Logically, the argument is __valid__ because it __does__ have the form of *modus tollens*. If our second premise is also true and leads to the conclusion that serial study does not __always__ result in more learning of mathematics skills than blocked study, then *modus tollens* provides the opportunity to __reject__ general hypotheses even based on a __single __experiment.

Is the argument a __sound__ deductive argument?

The argument will be sound if the second premise is true. You might argue: "how could we question its truthfulness without actually seeing the data?" You would have a legitimate point. HOWEVER, there is __one problem__ with Premise 2 that doesn't depend on the data.

The problem with Premise 2 is that in common practice, statistical tests (like t-tests) are __asymmetrical__. Statistical tests CAN test for __differences__ among groups to a specified level of confidence (e.g. 95%). However, if a statistical test * fails* to find significant differences among groups, then the statistical test has simply

__failed__. A failed statistical test is NOT strong evidence of the

__absence__of differences among groups (additional analysis such as interval identification or power analysis can determine the probability of Type II error; Giere, 2006. Completely different statistical frameworks such as Bayesian statistics can provide less categorical statistical comparisons (HĂ¶fler et al., 2018). However, a more extensive or nuanced approach to statistics is outside the scope of the current module).

A failed statistical test is commonly interpreted as: we __still don't know__ if there is a significant difference between groups or not.

For example, solely because our t-test failed to find a significant difference between the serial study and blocked study groups (Premise 2), we __cannot__ conclude (within our agreed-upon 95% confidence) that serial study and blocked study are NOT different. All we can conclude is that our t-test __failed__ to find a significant difference between groups: __we still do not know__ if there is a difference between serial and blocked practice or not! Therefore, Premise 2 is a *non-sequitur*. The failure of a statistical test does NOT reasonably lead to the conclusion that serial study results in the *same* amount of learning than blocked study of mathematics skills.

Why can we NOT conclude that two groups __the same__ if a statistical test __fails__ to find a significant difference?

The reason that we cannot come to firm conclusions based solely on the __absence__ of significant differences is because __there are many ways for statistical tests to fail__. A true lack of statistical differences between groups is * only one* potential reason that a statistical test can fail. Other common reasons for a "false negative" are:

* Sample sizes too small to detect differences between groups (lack of statistical "power").

* Violating one of the assumptions of parametric statistical tests (e.g. non-normal distribution).

* Outliers in the dataset that substantially increase the variance of one or more groups.

* Co-variation among variables that increase variance.

Strong study design and data analysis can mitigate some of the problems that affect statistical tests. However, for the parametric statistical tests commonly used in educational settings, failing to reject a hypothesis does not provide sufficient evidence to "accept" the hypothesis.

**"Null" Hypotheses allow us to reject hypotheses based on the statistical finding of significant differences.**

If a statistical test fails to find a statistically significant difference between groups, without additional analysis we cannot be confident that there is __no__ actual difference between groups. However, if statistical tests are performed correctly, we __can__ be confident (to a specified confidence level) that finding a statistically significant difference between groups indicates that an actual difference exists between groups. The level of our confidence is related to the "P value," which indicates the potential for a "false positive." A false positive means that even though there is NO difference between two groups, our statistical test *finds* one. P < 0.05 means that there is less than a 5% chance that our statistical test found a difference between groups that wasn't actually there.

Therefore, to use statistics with *modus tollens*, we must select a reasoning structure that allows us to use __significant differences__ to * reject* hypotheses. So-called "Null" hypotheses allow us to use

*modus tollens*to reject hypotheses.

__DEFINITION__: A "null" hypothesis is a prediction of NO differences between or among groups.

Null Hypotheses may seem awkward because we are predicting the __absence__ of differences instead of the presence of differences between groups (even though the presence of differences is commonly why we create the hypotheses in the first place). However, null hypotheses can help to clarify arguments. For example, we could frame our first premise as a null hypothesis:

__PREMISE 1__: IF non-repetitive study does NOT result in more learning than blocked study of mathematics skills,

THEN serial study will result in scores that are NOT significantly higher than blocked study on algebra, geometry, and word problems during retention tests.

If we conduct an experiment and find:

__PREMISE 2__: Retention test scores on algebra, geometry, and word problem tests __were__ significantly higher than blocked study during retention tests (t-tests; P < 0.05),

we can use *modus tollens* to come to the conclusion:

__CONCLUSION__: We __reject__ our null hypothesis. Serial study results in __more learning__ than blocked study of mathematics skills.

Our conclusion is both __valid__ and __sound__ because we use the valid syllogism *modus tollens* to __reject__ a hypothesis based on a statistically significant __difference__.

A reasonable question might be: what if we performed our experiment and __still didn't find__ a significant difference between groups? In the case of the lack of a significant difference, the argument becomes:

__PREMISE 1__: IF non-repetitive study does NOT result in more learning than blocked study of mathematics skills,

THEN serial study will result in scores that are NOT significantly higher than blocked study on algebra, geometry, and word problems during retention tests.

__PREMISE 2__: Retention test scores on algebra, geometry, and word problem tests after serial study were NOT higher than scores after blocked study during retention tests (t-tests; P > 0.05).

__CONCLUSION__: We support our null hypothesis that serial study does NOT result in more learning than blocked study of mathematics skills.

Is there a problem with the final argument?

The problem with the final argument is that it is in the form of a logical __fallacy__: affirming the consequent. We do not even need to think about the limitations of statistical tests to know that the argument is invalid and *cannot* be sound. Therefore, null hypotheses can help to clarify reasoning.