Remember that different parts of the CPU are designed to handle different stages of execution. Initially, processors were designed to execute instructions one after another. An instruction could not be processed until the previous instruction was complete which was inefficient as not all parts of CPU were in use at all times.To solve this, CPUs will fetch the first instruction. Once the instruction has been fetched the CPU will begin decoding it. Simultaneously the CPU will start fetching the second instruction. This is possible because separate parts of the CPU handle fetching instructions and decoding instructions.
There are, however, a few problems with pipelining:
Consider the following instructions:
Input A
B = A + 2
Input C
Since instruction 2 is dependant of instruction 1s result. Instruction 2 may not begin until instruction 1 is complete.
To increase the efficiency of a CPU in this case, processors will begin processing instruction 3 before instruction 2.
If the processor is required to process an IF instruction, it may not continue with the following instructions until the IF instruction is complete since the next instruction will depend on the outcome of the IF.
Modern CPUs will guess the outcome of the IF and begin processing the likely next instruction.