STRONG INFERENCE

"Strong inference" can increase confidence in hypotheses by rejecting alternative hypotheses.


Rejecting null hypotheses using modus tollens seems like quite a negative project. How can scientists "build models of the universe and its inhabitants" if all scientists can reasonably do is reject hypotheses?


One process for using deductive reasoning for scientific discovery has been called "Strong Inference" (Platt, 1964). Strong inference can be thought of as a method to generate knowledge (similar to inductive reasoning) using logically-sound deductive reasoning

Strong Inference repeats a single framework with only three steps:

    1) Devise alternative hypotheses.

    2) Design one or more experiments with at least two feasible outcomes. Every feasible outcome rejects one or more of the hypotheses.

    3) Carry out the experiment. Using the data to reject all hypotheses that can reasonably be rejected (e.g. using the deductive syllogism, modus tollens). 

It would be useful to define some terminology before discussing Strong Inference in more detail. 

1) Devise alternative hypotheses. 

Often, people think that "alternative" hypotheses must be the opposite of a general or measurable hypothesis that is the focus of a study.  It is true that statistical null hypotheses are often the negation of the measurable that an experiment seeks to test. 

However, alternative research hypotheses are strongest if they are NOT simply the negation of a hypothesis, but another plausible explanation of a phenomenon or outcome of an experiment. Often, alternative hypotheses are the result of reasoning from a different set of assumptions than the main hypotheses.

For example, we could consider alternative predictions for study strategy based on how repetitive and predictable the study is. Repetitive study involves doing the same types of problems over and over again, whereas non-repetitive study involves switching among different types of problems. Predictable study is when a person knows the order of problem type, whereas unpredictable study is when a person cannot predict which problem they will work on next. 

Blocked practice is both repetitive and predictable. Serial practice is non-repetitive but IS predictable. Random practice is neither repetitive nor predictable. We could ask the question: "Do repetitiveness, predictability, both, or neither affect learning?"

Three alternative general hypotheses that might reflect different assumptions about which type of study contributes most to learning might be:


GH1: Blocked study results in higher performance during practice and more learning than non-repetitive or unpredictable study of mathematics skills.

GH2: Non-repetitive study results in lower performance during practice, but more learning, than blocked study of mathematics skills.

GH3: Unpredictable study results in lower performance during practice, but more learning, than blocked study of mathematics skills.


More general hypotheses could also be possible based on other assumptions (e.g. both non-repetitive and unpredictable study might contribute to learning but not either on their own). 

Developing and testing viable alternative hypotheses are important for at least two reasons:

A) Strong alternative hypotheses can provide an important "hedge," or safeguard in case the data do not turn out as predicted. The safest experiments involve alternative hypotheses that ensure interesting conclusions no matter what the data are. The effort necessary to carefully design experiments and create alternative hypotheses can substantially reduce the time, stress, and probability of success when analyzing data and making conclusions. 

B) Strong alternative hypotheses can prevent emotional "attachment" to hypotheses (Platt, 1964). Scientists are human, and cannot be purely objective observers and decision-makers. Scientists who have invested considerable time and effort into a single hypothesis will potentially have difficulty rejecting their hypothesis. The scientists may not analyze and interpret the data in the most reasonable way, but in the way most favorable to their hypothesis. However, having alternative hypotheses increases the probability that some hypotheses will not be rejected, making it easier to reject others as necessary.

APPLICATION: Creating substantive alternative hypotheses has both practical and scientific value. Practically, alternative hypotheses can reduce the possibility for inconclusive experiments. Scientifically, alternative hypotheses can contribute to reasoning and objectivity.

2) Design one or more experiments with at least two feasible outcomes.

The first step of experimental design is to develop Measurable Hypotheses. Based on each general hypothesis, we could create several Measurable Hypotheses.

For example, Measurable Hypotheses corresponding to GH1 could be:

MH1a: Blocked study [both repetitive and predictable] will result in significantly higher performance during practice than both serial  [not repetitive but predictable] and random [not repetitive or predictable] study of mathematics skills. 

MH1b: Blocked study will result in significantly higher performance during retention and transfer tests than both serial and random study of mathematics skills. 


Measurable hypotheses corresponding to the alternative general hypothesis GH2 could be:

MH2a: Serial study will result in significantly lower performance during practice than blocked study of mathematics skills. However, serial study will not result in performance during practice that is significantly different from random study.

MH2b: Serial study will result in significantly higher performance during retention and transfer tests than blocked study of mathematics skills. However, serial study will notresult in performance during retention and transfer tests that is significantly different from random study .


Measurable hypotheses corresponding to the alternative general hypothesis GH3 could be:

MH3a: Random study will result in significantly lower performance during practice than both serial and blocked study of mathematics skills. 

MH3b: Random study will result in significantly higher performance during retention and transfer tests than both serial and blocked study of mathematics skills. 


A Graphical Framework can help to visualize our hypotheses. A useful framework for Strong Inference is a tree structure (Platt, 1964).

Starting from an overall question at the "trunk" of the tree, we can imagine that different branches of the tree represent different possibilities (general hypotheses). Each general hypothesis sprouts at least one measurable hypothesis. 

If the tree seems somewhat complicated, then perhaps we should trim it! 

Trimming the tree involves designing an experiment that can allow us to cut away one or more branches. For example, we could test mathematics performance during practice and also performance on retention and transfer tests to measure learning. We could compare three separate groups of students: students who engaged in blocked study, students who engaged in serial study, and students who engaged in random study. Significant differences in performance among groups could lead us to reject  some hypotheses.

APPLICATION: To use Strong Inference and deductive reasoning, design experiments that are capable of rejecting one or more alternative hypotheses.

3) Carry out the experiment. Using the data to reject all hypotheses that can reasonably be rejected.

With a strong framework of general and measurable hypotheses, carrying out an experiment can be relatively straightforward (although experiments are often more complicated than predicted). Imagine that we performed the experiment with results shown in the following table. For the table, only statistically significant comparisons are indicated with a ">" sign (non-significant comparisons not listed).

Based on the results, can we reject any of our general hypotheses?

Yes. Based on our Results, we can reject both GH1 and GH3. Blocked practice is not as effective as non-repetitive practice for math skills. However, unpredictability did not improve learning outcomes relative to simply non-repetitive practice. Therefore, we can "trim" two branches from our tree:

Once we have removed the branches of the tree that we have rejected, we can continue with a new set of questions based on GH2:

We can then repeat the procedure (starting from step 1). Each question (branch) of the tree can give rise to several alternative general hypotheses, measurable hypotheses, and experiments. With every experiment, we use deductive reasoning to reject as many hypotheses as possible. Hypotheses that have not been rejected after many experiments have tried (and failed) to reject them can be considered to be "supported." However, even hypotheses that have survived many tests are still hypotheses. There is no time when it is possible to stop and declare that a hypothesis has been "proven" to be true, because it is always possible that other alternative hypotheses exist.

Use experimental evidence to reject as many hypotheses as possible. Hypotheses that survive an experiment without being rejected can be thought of as "supported."