Hacker News

paper looks nice! i think what they found was that they can recover the input sequence by trying all tokens from the vocab and finding a unique state. they do a forward pass to check each possible token at a given depth. i think this is since the model will encode the sequence in the mid flight token so this encoding is revealed to be unique by their paper. so one prompt of 'the cat sat on the mat' and 'the dog sat on the mat' can be recovered as distinct states via each token being encoded (unclear mechanism but it would be shocking if this wasn't the case) in the token (mid flight residual).