There will be details like rounding errors that will make certain sequences unreachable in practice, but that shouldn't provide you any comfort unless you know your dangerous outputs fall into that space. But they absolutely don't; the sequences we're interested in - well structured tool calls that contain dangerous parameters but are otherwise indistinguishable from desirable tool calls - are actually pretty probable.

The probability that an ideal, continuous LLM would output a 0 for a particular token in it's distribution is itself 0. The probability that an LLM using real floating point math isn't terrifically higher than 0.

Source: I write transformers for a living.

There is a piece of knowledge you seem to be missing. Yes, a transformer will output a distribution over all possible tokens at a given step. And none of these are indeed zero, but always at least larger than epsilon.

However, we usually don't sample from that distribution at inference time!

The common approach (called nucleus sampling or also known as top-p sampling) will look at the largest probabilities that make up 95% of the probability mass. It will set all other probabilities to zero, renormalize, and then sample from the resulting probability distribution. There is another parameter `top-k`, and if k is 50, it means that you zero out any token that is not in the 50 most likely tokens.

In effect, it means that for any token that is sampled, there is usually really only a handful of candidates out of the thousands of tokens that can be selected.

So during sampling, most trajectories for the agent are literally impossible.

Thank you for the explanation. But you do understand why none of that matters after the prod DB is gone right? Yes there should be backups but when management fires ops and dumps that work on the devs, it doesn't tend to happen.

So I want you to understand this. You are basically selling heroin to junkies and then acting like the consequences aren't in any way your fault. Management will far too often jump at false promises made by your execs. Your technology is inherently non-deterministic. Therefore your promises can't be true. Yet you are going to continue being part of a machine that destroys businesses and lives. Please at least act like you understand this.

I appreciate the information, I am weak on the details of LLM sampling algorithms, but I already conceded that the statement isn't literally true of realized models (it's true of idealized models) and the tokens we're concerned with are likely to be in the renormalized distribution because the desired and dangerous tokens are virtually the same.