Hacker News

Yes, you're right, in that there's no decision module separate from the output. It overcommits in the other direction.

The post-hoc reasoning the model produces when you ask "why did you do that" is also just text, and yet that text often matches independent third-party analysis of the same behavior at well above chance. If it really were uncorrelated text-completion, the post-hoc explanation should not align with the actual causes more than randomly. It does, frequently enough that I've stopped using it as evidence the user is naive.

"just outputs text" is doing more work than we acknowledge. The person asking the agent "why did you do that" might be an idiot for expecting anything more than a post-hoc rationalization, but that's exactly what you'd expect from a human too.