> Just like the difference between 'him' and 'her' is inscrutable taken out of context,
The context is pretty flexible, like "Do you know Jim? I saw him at the store." Or, "Do you know Jim? Fifteen days ago, I saw him at the store." There’s a relatively small universe of pronouns (him, her, that, who, etc) and the pronouns refer to a token nearby (in this case, Jim).
With machine code, there’s a massive set of jump offsets, and the referent isn’t a token, but rather a location to start processing.
> In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation,
When you’re manipulating machine code, you’ve stepped away from abstract symbol manipulation and you’re just manipulating byte values now.
I don’t think your argument here is convincing. Maybe you can point to a demo or some architecture where this works. But my sense is this—once you start designing a harness to make LLMs capable of writing machine code, or designing an architecture for LLMs to write machine code, something in your implementation probably looks like an assembler, and something in your internal tokenization of the machine code probably looks like a higher-level language.