I think this is a rosy estimate. The vast majority of what people do with these models is just the same old shit, I would be surprised if 1% of it were genuinely novel stuff worth folding back into the training data.

Even if "is just the same old shit" they have much more data and of a much higher quality to scale the RL pipeline.