It was luck that a viable non-graphics application like deep learning existed which was well-suited to the architecture NVIDIA already had on hand. I certainly don't mean to diminish the work NVIDIA did to build their CUDA ecosystem, but without the benefit of hindsight I think it would have been very plausible that GPU architectures would not have been amenable to any use cases that would end up dwarfing graphics itself. There are plenty of architectures in the history of computing which never found a killer application, let alone three or four.
Even that is arguably not lucky, it just followed a non-obvious trajectory. Graphics uses a fair amount of linear algebra, so people with large scale physical modeling needs (among many) became interested. To an extent the deep learning craze kicked off because of developments in computation on GPUs enabled economical training.
Nvidia started their GPGPU adventure by acquiring a physics engine and porting it over to run on their GPUs. Supporting linear algebra operations was pretty much the goal from the start.
They were also full of lies when they have started their GPGPU adventure (like also today).
For a few years they have repeated continuously how GPGPU can provide about 100 times more speed than CPUs.
This has always been false. GPUs are really much faster, but their performance per watt has oscillated during most of the time around 3 times and sometimes up to 4 times greater in comparison with CPUs. This is impressive, but very far from the "100" factor originally claimed by NVIDIA.
Far more annoying than the exaggerated performance claims, is how the NVIDIA CEO was talking during the first GPGPU years about how their GPUs will cause a democratization of computing, giving access for everyone to high-throughput computing.
After a few years, these optimistic prophecies have stopped and NVIDIA has promptly removed FP64 support from their price-acceptable GPUs.
A few years later, AMD has followed the NVIDIA example.
Now, only Intel has made an attempt to revive GPUs as "GPGPUs", but there seems to be little conviction behind this attempt, as they do not even advertise the capabilities of their GPUs. If Intel will also abandon this market, than the "general-purpose" in GPGPUs will really become dead.
GPGPU is doing better than ever.
Sure FP64 is a problem and not always available in the capacity people would like it to be, but there are a lot of things you can do just fine with FP32 and all of that research and engineering absolutely is done on GPU.
The AI-craze also made all of it much more accessible. You don't need advanced C++ knowledge anymore to write and run a CUDA project anymore. You can just take Pytorch, JAX, CuPy or whatnot and accelerate your numpy code by an order of magnitude or two. Basically everyone in STEM is using Python these days and the scientific stack works beautifully with nvidia GPUs. Guess which chip maker will benefit if any of that research turns out to be a breakout success in need of more compute?
> A few years later, AMD has followed the NVIDIA example.
When bitcoin was still profitable to mine on GPUs, AMD's performed better due to not being segmented like NVIDIA cards. It didn't help AMD, not that it matters. AMD started segmenting because they couldn't make a competitive card at a competitive price for the consumer market.
> GPGPU can provide about 100 times more speed than CPUs
Ok. You're talking about performance.
> their performance per watt has oscillated during most of the time around 3 times and sometimes up to 4 times greater in comparison with CPUs
Now you're talking about perf/W.
> This is impressive, but very far from the "100" factor originally claimed by NVIDIA.
That's because you're comparing apples to apples per apple cart.
For determining the maximum performance achievable, the performance per watt is what matters, as the power consumption will always be limited by cooling and by the available power supply.
Even if we interpret the NVIDIA claim as referring to the performance available in a desktop, the GPU cards had power consumptions at most double in comparison with CPUs. Even with this extra factor there has been more than an order of magnitude between reality and the NVIDIA claims.
Moreover I am not sure whether around 2010 and before that, when these NVIDIA claims were frequent, the power permissible for PCIe cards had already reached 300 W, or it was still lower.
In any case the "100" factor claimed by NVIDIA was supported by flawed benchmarks, which compared an optimized parallel CUDA implementation of some algorithm with a naive sequential implementation on the CPU, instead of comparing it with an optimized multithreaded SIMD implementation on that CPU.
At the time, desktop power consumption was never a true limiter. Even for the notorious GTX 480, TDP was only 250 W.
That aside, it still didn't make sense to compare apples to apples per apple cart...
That physics engine is an example of a dead-end.
There's something of a feedback loop here, in that the reason that transformers and attention won over all the other forms of AI/ML is that they worked very well on the architecture that NVIDIA had already built, so you could scale your model size very dramatically just by throwing more commodity hardware at it.