You only need to train a range of small models in order to establish a plausible scaling law, IMO.