Here's an output text: "Yes." Recover the exact input that led to it. (you can't, because the hidden state is already irreversibly collapsed during the sampling of each token)
The paper doesn't claim this to be possible either, they prove the reversibility of the mapping between the input and the hidden state, not the output text. Or rather "near-reversibility", i.e. collisions are technically possible but they have to be very precisely engineered during the model training and don't normally happen.
Here's an output text: "Yes." Recover the exact input that led to it. (you can't, because the hidden state is already irreversibly collapsed during the sampling of each token)
The paper doesn't claim this to be possible either, they prove the reversibility of the mapping between the input and the hidden state, not the output text. Or rather "near-reversibility", i.e. collisions are technically possible but they have to be very precisely engineered during the model training and don't normally happen.
if you generate a lot of output text you can approximate the hidden state.