The thing that's also worth saying is that everyone speaks vaguely about CUDA's "institutional memory" and investment and so forth.
But the concrete qualities of CUDA and Nvidia's offerings generally is a move toward general purpose parallel computing. Parallel processing is "the future" and approach of just do loop and have each iteration be parallel is dead simple.
Which is to say Nvidia has invested a lot in making "easy things easy along with hard things no harder".
In contrast, other chip makers seem to be acculturated to the natural lock-in of having a dumb, convoluted interface to compensate for a given chip being high performance.