Same is probably true of humans. In a conversation, we often respond from instinct, then work backwards to a rationalization only when asked. For more considered thoughts, if we’re lucky, we can remember our “reasoning traces” but that’s as deep as our introspection goes. Unless we’re neuroscientists, we don’t even know how many neurons we have, let alone have any understanding of how they generate our thoughts. Motivated reasoning impairs our introspection further, and then dishonesty and communication errors prevent us from relaying the limited remaining information to each other.
Model interpretability work has advanced a lot. Arguably we already can explain AI decision-making better than human brains.
No, it happens in the immediate context, where e.g. we say 'No I meant Meredith Jones, not Meredith Smith'- and the possibility of this elaboration is actually part of ordinary communication. I did mean Meredith Jones, not Meredith Smith - thus the use of the past tense The LLM will just give the best answer for what one might have meant, completely reopening calculation.
The point is familiar but there are good illustrations in the Atlantic article by a book editor. At first it seems abstract AI hate, but then she gets to the details. AI text cannot be edited. https://www.theatlantic.com/technology/2026/05/how-to-tell-a... or https://archive.ph/YJsGK
Nonsense, some of my friends are lawyers and they're able to give you consistent interpretations on why they think about a certain aspect of a law a certain way. The whole thing is that they work with this the entire time, so they have a really consistent 'head model' of how things work and why and how considerations should be weighted/ordered/whatever. LLMs just do not have this, there's no consistent underlying reasoning (the 'reasoning' traces in LLMs are really inconsistent)