"How do you know the output is good without redoing the work yourself?"
Verifying the correctness of solutions is often much easier than finding correct solutions yourself. Examples: Sudoku and most practical problems in just about any field.
-
"The training doesn't evaluate 'is the answer true' or "is the answer useful.'"
Lets pretend RLVF does not exist to give this argument a chance. Then, while the training loop does not validate accuracy directly I guess, the meta-training loop still does. When someone prompts a model, the resulting execution trace shows if the generated answer is correct or not, and this trace is kept for subsequent training runs. The way coding agents are used productively is not: a) generate code with AI and b) run it yourself; its a) ask the AI to do something, including generating the code and running it too, no step b. This naturally creates large training sets of correct and incorrect solutions.
-
"We spent billions to create systems used to perform a simulacrum of work."
Have you even tried using these systems to produce valuable work? How could this possibly be your conclusion after having tried them?
"Verifying the correctness of solutions is often much easier than finding correct solutions yourself."
Honestly, this has not been my experience at all. Defining what a good solution looks like is most of the battle in Operational Research. But, trying to be constructive, maybe we have identified a sort of diving line between areas where AI is more vs. less helpful.
>"We spent billions to create systems used to perform a simulacrum of work."
>Have you even tried using these systems to produce valuable work? How could this possibly be your conclusion after having tried them?
The operative words there are used to, as opposed only able to. The conclusion isn't derived from using the tools, it's from observing how other people tend to use them.
> Verifying the correctness of solutions is often much easier than finding correct solutions yourself
In order to verify correctness you need to understand what correctness is in context, which is actually pretty hard to do if you can't actually find correct solutions yourself, or even if you can but haven't bothered to do so