>I dislike the term “stochastic parrot”, because there’s plenty of evidence that LLMs do have an understanding of at least some things that they are saying.

It's bold to use the term "understanding" in this context. You ask it something about a topic, it gives an answer like someone who understands the topic. You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.

The fact that the LLM can be shown to have some sort of internal representation does not necessarily mean that we should call this "understanding" in any practical sense when discussing these matters. I think it's counterproductive in getting to the heart of the matter.

> You change the prompt slightly, where a human who understands the topic would still give the right response trivially, the LLM outputs an answer that is both wrong/irrelevant and unpredicably and non-humanly wrong in a way that no human who exhibited understanding with the first answer could be predicted to answer the second question in the same bizarrw manner as the LLM.

I think this should make you question whether the prompt change was really as trivial as you imply. Providing an example of this would elucidate.

Here's an entire paper [0] showing the impact of extremely minor structural changes on the quality of the results of the model. Things as simple as not using a colon in the prompt can lead to notably degraded (or improved) performance.

0. https://arxiv.org/pdf/2310.11324