No, it isn't. The private data is largely private data, created by highly-specialized, highly-paid contracted teams of experts for domains finance, swe, consulting, etc.

Reddit data is just not that interesting, that deal is worth like $60m/year. Labs spend 10x as much on computer-use RL environments.

Sorry but your argument doesn't seem coherent: How is the cost of RL relevant here?

It would also help if you could substantiate your initial claim (i.e. "internet training data is not where frontier capabilities come from")

RL environment (instruction, stateful container, reward function) is the training data product being bought