What the LLM cannot do is explain why it said what it said, when cross-examined. It simply hallucinates the best account of why someone would have said such a thing as it said, same as it can give a probable account of why someone else said something different. The question 'But why did you say this not that ...?' does not lead it to make explicit its grounds for what it said, but just to make a new more complicated statement.

This is true in the naive case.

There are however LLM context building techniques that anchor completions in data structures that persist the structure of claims that support the conclusion contained in a completion. Lots of different patterns exist —organizing logic in language is a rich domain— but the one I’ve liked the most is something called a Claim Dependency Graph that models the relationships between atomic claims as graph edges.

There’s a whole suite of operations you can perform on these structures, and “reconstruct how you came to this conclusion” is absolutely one of them.

I’d love to read more about these type of patterns. Do you have any recommendations?

[deleted]

A human has a motive that exists that frames the thought being expressed. An LLM is going to be creating a “de novo” thought in response to a line of questioning.

Psychology has shown that a lot of those motives are just post hoc narratives, similar to LLM.

Or, as the extreme claim (and the one that I believe), all of them are: https://en.wikipedia.org/wiki/Epiphenomenalism

Same is probably true of humans. In a conversation, we often respond from instinct, then work backwards to a rationalization only when asked. For more considered thoughts, if we’re lucky, we can remember our “reasoning traces” but that’s as deep as our introspection goes. Unless we’re neuroscientists, we don’t even know how many neurons we have, let alone have any understanding of how they generate our thoughts. Motivated reasoning impairs our introspection further, and then dishonesty and communication errors prevent us from relaying the limited remaining information to each other.

Model interpretability work has advanced a lot. Arguably we already can explain AI decision-making better than human brains.

No, it happens in the immediate context, where e.g. we say 'No I meant Meredith Jones, not Meredith Smith'- and the possibility of this elaboration is actually part of ordinary communication. I did mean Meredith Jones, not Meredith Smith - thus the use of the past tense The LLM will just give the best answer for what one might have meant, completely reopening calculation.

The point is familiar but there are good illustrations in the Atlantic article by a book editor. At first it seems abstract AI hate, but then she gets to the details. AI text cannot be edited. https://www.theatlantic.com/technology/2026/05/how-to-tell-a... or https://archive.ph/YJsGK

Nonsense, some of my friends are lawyers and they're able to give you consistent interpretations on why they think about a certain aspect of a law a certain way. The whole thing is that they work with this the entire time, so they have a really consistent 'head model' of how things work and why and how considerations should be weighted/ordered/whatever. LLMs just do not have this, there's no consistent underlying reasoning (the 'reasoning' traces in LLMs are really inconsistent)

LLMs hallucinate, because humans hallucinate.

Asking the LLM in a way where it annotates its sources, it can greatly increase the pattern matching to closely simulate logic, just like in humans.

I understand the question of why did you say this, not that, I have seen other ways of asking that which do not seem to trigger the LLMs over-response in the other direction.

Humans hallucinate because they take shrooms or have schizophrenia.

No, the hallucination of its reasons follows immediately from the technique of probabilistic inference. You can see this in real time, just ask 'why did you use this word, not that word?' It is in the position of a desperate liar. All its responses are essentially 'rationalizations'