Kelvin Law - Chain of Thought (CoT) Prompting

What is Chain-of-Thought Prompting?

Chain-of-Thought prompting is a technique that encourages an LLM to generate a series of intermediate reasoning steps (“chain of thought”) before producing a final answer. In practice, this is often done by including examples or instructions in the prompt that show the model how to reason step by step. For example, instead of asking a question and expecting an immediate answer, we might prompt: “Let’s think this through step by step.” The model then produces a multi-step explanation or calculation, followed by the answer.

CoT prompting was pioneered by Wei et al. (2022), who showed that adding a few exemplar Q&A pairs with detailed reasoning (or even just a trigger phrase like “Let’s think step by step”) allowed large models to solve complex math word problems, logic puzzles, and commonsense reasoning tasks far more accurately. The key insight is that large language models can emulate a step-by-step problem-solving process if guided to do so, rather than treating every question as a black box mapping from query to answer.

What’s noteworthy is that, chain-of-thought prompting tends to be effective only with sufficiently large and capable models. Smaller models (e.g. with <10B parameters) often cannot reliably follow multi-step reasoning prompts. But models like GPT-3 (175B), PaLM (540B), and GPT-4 (with even more parameters) have shown an emergent ability to carry out multi-step reasoning when prompted appropriately. In other words, the reasoning capability “emerges” at a certain scale of model complexity, allowing CoT to be a useful prompting strategy.

Why produce a chain of thought?

For one, it allows the model to break down complex problems into manageable steps, akin to how a human expert might work through a problem on paper. By decomposing the task, the model can focus on one piece of reasoning at a time, which reduces the chances of making a mistake on tasks requiring multiple inferential steps. Additionally, the chain-of-thought is interpretable. Researchers can read the model’s reasoning and see how it arrived at an answer. This transparency is valuable in research settings where trust and verification are important.

To illustrate, the following provides an example of a math word problem that the model originally got wrong, but after being prompted to produce a step-by-step solution, it reached the correct answer. The reasoning trace essentially helped the model avoid a careless error. This highlights how CoT prompting enabled the model to tackle arithmetic and logic problems it previously could not: