That could be interesting but it does seem like a much fuzzier and slower feedback loop than the original idea.
It also seems less unique to code. You could also have a chat bot write an encyclopedia and see if the encyclopedias sold well. Chat bots could edit Wikipedia and see if their edits stuck as a reward function (seems ethically pretty questionable or at least in need of ethical analysis, but it is possible).
The maybe-easy to evaluate reward function is an interesting aspect of code (which isn’t to say it is the only interesting aspect, for sure!)