I well aware of Schmidhuber, still the scaling of the compute was critical. The reason Schmidhuber didnt go all the way is still scaling/capital, which accrues to monopolies who can afford wildly speculative research. Also, LSTM, RNN, etc, while effective for their time, were dead ends.

compute was critical, but compute was happening anyway. I fail to say why you are convinced that there would not be transformers.

I'm not saying that, I'm saying you need a company with huge amounts of cash on hand to spend on R&D. The monopoly provides a safe space for it, like it or not, this system has worked.