Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.
I highly doubt that.
Empirical studies show that humans have very little effect on error rates when reviewing code. That effect disappears quickly the more code you read.
Most programmers are bad at detecting UB and memory ownership and lifetime errors.
A piece of wood comes off the table it’s cut or it’s not.
Code is far more complex.
> Most programmers are bad at detecting UB and memory ownership and lifetime errors.
And this is why we have languages and tooling that takes care of it.
There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.
But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.
Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.
A piece of wood is either cut to spec or not. You don’t have to try and convince the table saw with a prompt that it is a table saw.
These tools are nothing alike and the reductionism of this metaphor isn’t helpful.
But how do you know it's cut to spec if you don't measure it?
Maybe someone bumped the fence aw while you were on a break, or the vibration of it caused the jig to get a bit out of alignment.
The basic point is that whether a human or some kind of automated process, probabilistic or not, is producing something you still need to check the result. And for code specifically, we've had deterministic ways of doing that for 20 years or so.