Hacker News

Chain of thought, now including "reasoning", are basically a work around for the simplistic nature of the Transformer neural network architecture that all LLMs are based on.

The two main limitations of the Transformer that it helps with are:

1) A Transformer is just a fixed-size stack of layers, with a one-way flow of data through the layers from input to output. The fixed number of layers equates to how many "thought" steps the LLM can put into generating each word of output, but good responses to harder questions may require many more steps and iterative thinking...

The idea of "think step by step", aka chain of thought, is to have the model break it's response down into a sequence of steps, each building on what came before, so that the scope of each step is withing the capability of the fixed number of layers of the transformer.

2) A Transformer has extremely limited internal memory from one generated word to the next, so telling the model to go one step at a time, feeding its own output back in as input, in effect makes the model's output a kind of memory that makes up for this.

So, chain of thought prompting ultimately give the model more thinking steps (more words generated), together with memory of what it is thinking, in order to be able to generate a better response.