Hacker News

>do manage to effectively convey the external effect

But the problem is that this does not inform about the failure mode. So if I am understanding correctly, you are saying that the behavior of LLM, when it works, is like it has internalized the concepts.

But then it does not inform that it can also say stuff that completely contradicts what it said before, there by also contradicting the notion of having "internalized" the concept.

So that will turn out to be a lie.

TeMPOraL 16 hours ago [ - ]

If you look at the failure modes, they very closely resemble the failure modes of humans in equivalent situations. I'd say that, in practice, anthropomorphic view is actually the most informative we have about failure modes.

qsera 12 hours ago [ - ]

>they very closely resemble the failure modes of humans in equivalent situations

I don't think they do if we are talking about a honest human being.

LLMs will happily hallucinate and even provide "sources" for their wrong responses. That single thing should contradict what you are saying.