Hacker News

This is very similar (and maybe even the same thing) to some recent work (published earlier this year) by the people at Ritual AI on attacking attempts to obfuscate LLM inference (which leads to the design for their defense against this, which involves breaking up the prompt token sequences and handing them to multiple computers, making it so no individual machine has access to sufficient states from the hidden layer in a row).

https://arxiv.org/abs/2505.18332

https://arxiv.org/abs/2507.05228