> internet training data is not where frontier capabilities come from
We 100% would not be at the current progress without it, though. And it's not like they only train on this once. They keep training on all the internet data PLUS the private data. Private data only (probably) wouldn't work, as learning the base regularities of language takes a lot of weights.