But collecting more data is just a naive task. The reason scale works is because of the way we typically scale. By collecting more data, we also tend to collect a wider variety of data and are able to also collect more good quality data. But that has serious limits. You can only do this so much before you become equivalent to the naive scaling method. You can prove this yourself fairly easily. Try to train a model on image classification and take one of your images and permute one pixel at a time. You can get a huge amount of scale out of this but your network won't increase in performance. It is actually likely to decrease.