> The input data is still human produced
So are rules for the game of go or chess ? Specifying code that satisfies (or doesn't satisfy) is a problem statement - evaluation is automatic.
> but they don't directly imply the correctness of a test.
I'd be willing to bet that if you start with an existing coding model and continue training it with coverage/lint metrics and evaluation as feedback you'd get better at generating tests. Would be slow and figuring out how to build a problem dataset from existing codebases would be the hard part.
> So are rules for the game of go or chess ?
The rules are well defined and you can easily write a program that will tell whether a move is valid or not, or whether a game has been won or not. This allows you generate virtually infinite amount of data to train the model on without human intervention.
> Specifying code that satisfies (or doesn't satisfy) is a problem statement
This would be true if you fix one specific program (just like in Go or Chess you fix the specific rules of the game and then train a model on those) and want to know whether that specific program satisfies some given specification (which will be the input of your model). But if instead you want the model to work with any program then that will have to become part of the input too and you'll have to train it an a number of programs which will have to be provided somehow.
> and figuring out how to build a problem dataset from existing codebases would be the hard part
This is the "Human Feedback" part that the tweet author talks about and the one that will always be flawed.