That's not how I read the transformer stuff around the time it was coming out: they had concrete hypotheses that made sense, not just random attempts at striking it lucky. In other words, they called their shots in advance.
I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?
Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?
The premise is wrong, we are not seeing diminishing returns. By basically any metric that has a ratio scale, AI progress is accelerating, not slowing down.
For example?
For example:
The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)
Note we're in a sub-thread about whether 'only data matters, not architecture', so I don't disagree that functionality or revenue are growing _in general_, but that's not we're talking about here.
The point is that core model architectures don't just keep scaling without modification. MoE, inference-time, RAG, etc. are all modifications that aren't 'just use more data to get better results'.