The word "prediction" still holds a lot of weight. LLM's only can predict what has been written. This is a hard limit.
For something like "a hard limit" to hold, LLMs must be restricted to only reproducing existing text. This is utterly false even for base models - their basin seems to be "permutations loosely inspired by existing text".
And that's before all the post-training comes in.
What's the "limit" there?
For something like "a hard limit" to hold, LLMs must be restricted to only reproducing existing text. This is utterly false even for base models - their basin seems to be "permutations loosely inspired by existing text".
And that's before all the post-training comes in.
What's the "limit" there?