Hacker News

I don’t actually think an LLM is a good way to sum numbers, but it is a pretty good example to explain the phenomenon you’re interested in. When you run an LLM, you essentially take your input and matrix multiply it with the weights to get an output. This happens regardless of what the input is or what the model “needs” to do. So, to some extent, the same part of the machine is being used every time even though the results seem very different.

(Of course, the reality is much more complicated than this; you can trace things like power and in theory track individual activations with great difficulty and then do interpretability to see what the model is doing. But hopefully it illustrates that you can usually take some sort of special operation and turn it into a process that does the same operation on different data.)