My understanding is that they claim that for every unique prompt there is a unique final state of the LLM. Isn't that patently false due to the finite state of the LLM and the ability (in principle, at least) to input arbitrarily large number of unique prompts?
I think their "almost surely" is doing a lot of work.
A more consequential result would give the probability of LLM state collision as a function of the number of unique prompts.
As is, they are telling me that I "almost surely" will not hit the bullseye of a dart board. While likely true, it's not saying much.
But, maybe I misunderstand their conclusion.
I think their claims are limited to the "theoretical" LLM, not to the way we typically use one.
The LLM itself has a fixed size input and a fixed size, deterministic output. The input is the initial value for each neuron in the input layer. The LLM output is the vector of final outputs of each neuron in the output layer. For most normal interactions, these vectors are almost entirely 0s.
Of course, when we say LLM, we typically consider infrastructure that abstracts these things for us. Especially we typically use infra that takes the LLM outputs as probabilities, and thus typically produces different results even for the exact same input - but that's just a choice in how to interpret these values, the values themselves are identical. Similarly on the input side, the max input is typically called a "context window". You can feed more input into the LLM infra than the context window, but that's not actual input to the model itself - the LLM infra will simply pick a part of your input and feed that part into the model weights.
Well that is not how I reed it, but: Every final state has an unique prompt. You could have several final states have the same unique prompt.
> You could have several final states have the same unique prompt.
They explicitly claim that the function is injective, that is, that each unique input produces a unique output.