Hacker News

The thing is, even "bad generalization" in LLMs often looks like "humanlike failures" rather than "utterly inhuman failures". They "generalize" just well enough to fall for tricks like the age of the captain problem.

I don't think "the data does not exist" is real, frankly? "Data existing" is not a binary - it's a sliding scale. The amount of information about "madness" captured by the writings of a madman is not zero. It's more of a matter of: how much, and how complete.

Text is projected from the internal state of the one writing it - but some aspects of that internal state would be extremely salient in it, presented directly and strongly, and others would be attenuated and hard to extract.

People keep finding things like humanlike concept clusters and even things like "personality traits" in LLMs, tied together in humanlike ways. Which points pretty directly: training on human text converges to humanlike solutions at least sometimes.