I'm actually working on something similar to this where you can encode information into the outputs of LLM's via steganography: https://github.com/sutt/innocuous
Since I'm really looking to sample the only the top ~10 tokens, and I mostly test on CPU-based inference of 8B models, there's probably not a lot of worries getting a different order of the top tokens based on hardware implementation, but I'm still going to take a look at it eventually, and build in guard conditions against any choice that would be changed by an epsilon of precision loss.