I thought the consensus was that models couldn’t actually introspect like this. So there’s no reason to think any of those reasons are actually why the model did what it did, right? Has this changed?

This argument has become a moot discussion. Humans are also not able to introspect their own neural wiring to the point where they could describe the "actual" physical reason for their decisions. Just like LLMs, the best we can do is verbalize it (which will naturally contain post-act rationalization), which in turn might offer additional insight that will steer future decisions. But unlike LLMs, we have long term persistent memory that encodes these human-understandable thoughts into opaque new connections inside our neural network. At this point the human moat (if you can call it that) is dynamic long term memory, not intelligence.

I think many humans engage in metacognitive reasoning, and that this might not be strongly represented in training data so it probably isn't common to LLMs yet. They can still do it when prompted though.

LLMs have zero metacognition. Don't be fooled - their output is stochastic inference and they have no self-awareness. The best you'll see is an improvised post-hoc rationalization story.

You can turn all these argents around and prove the same is true for humans. Don't be fooled by dogmatic people who spread the idea that the human mind is the pinnacle of cognition in the universe. Best to leave that to religion.

> The best you'll see is an improvised post-hoc rationalization story.

Funny, because "post-hoc rationalization" is how many neuroscientists think humans operate.

That LLMs are stochastic inference engines is obvious by construction, but you skipped the step where you proved that human thoughts, self-awareness and metacognition are not reducible to stochastic inference.