> The difference between “jmp $+15” and “jmp $+16” is inscrutable

Just like the difference between 'him' and 'her' is inscrutable taken out of context, but that's why LLMs have embeddings they use to store contextual information in huge vectors and have an input processing phase during which the input tokens gain contextual information, so that the LLM knows that 'him' refers to 'Peter' and 'her' refers to 'Jane'. Likewise it will be able to infer that $+15 is the 'success' branch of control flow and $+16 is the fail branch.

The way computer programs and natural language differ, is that in language, words with absolute or at least very constrained meanings are common, while code, is basically a pure manipulation of symbols, with variable and function names being meaningless helpers, and the actual meaning needs to be deduced from the way these symbols are manipulated.

In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation, and are far less bothered than humans with 'add rax, rcx' by the fact that the meaning of 'rax' and 'rcx' are heavily contextual, as they dedicate a lot of time to build up rich contextual information that might be different in every place these symbols appear.

> Just like the difference between 'him' and 'her' is inscrutable taken out of context,

The context is pretty flexible, like "Do you know Jim? I saw him at the store." Or, "Do you know Jim? Fifteen days ago, I saw him at the store." There’s a relatively small universe of pronouns (him, her, that, who, etc) and the pronouns refer to a token nearby (in this case, Jim).

With machine code, there’s a massive set of jump offsets, and the referent isn’t a token, but rather a location to start processing.

> In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation,

When you’re manipulating machine code, you’ve stepped away from abstract symbol manipulation and you’re just manipulating byte values now.

I don’t think your argument here is convincing. Maybe you can point to a demo or some architecture where this works. But my sense is this—once you start designing a harness to make LLMs capable of writing machine code, or designing an architecture for LLMs to write machine code, something in your implementation probably looks like an assembler, and something in your internal tokenization of the machine code probably looks like a higher-level language.