People don't seem to understand that running neural network inference is very easy. It's not the machine learning frameworks and libraries that are difficult to get right. Those are the trivial part.

The hard part is getting a culture that gives a damn about developing software that works and designing the hardware to support the features that the software needs.

AMD has not figured out how to run both graphics and compute on the same GPU. There can be many reasons for that, but honestly it is probably because they either don't have the necessary virtualization hardware or because two different drivers are conflicting with one another.

> The hard part is getting a culture that gives a damn about developing software that works and designing the hardware to support the features that the software needs.

NVIDIA isn't missing the mark on the programming model and toolkit framework (PTX and forward/backward compat) either. They have a good, lean gpu design with a lot of features and a good programming model and ecosystem etc.

You're right, it's not just the matrix math, that's not rocket science, but there's a ton of little glue code around it. And you need something GPU-like for that anyway, plus a bunch of scheduler and shader-execution-reordering stuff for your tensor threads and glue code, etc. You end up with something broadly similar to a GPU anyway.

It's the ProgPOW theorem, right? That there is not some major gain to be squeezed by implementing a smaller/different machine on the instruction set. That GPUs are relatively close to some kind of computational optimum for parallel workloads (in terms of programmability/flexibility and performance).

NVIDIA's model isn't far off the global optimum imo, it's certainly in a great local minimum, and that's really true of a lot of their designs these days. It is always a little wild how everyone trivializes the idea that AMD/etc are going to catch up with some 80% solution in RT or tensor etc... like just maybe NVIDIA did the math and figured out what they think a reasonable ray performance level is, and how much they'd need to upscale, and what parts of the pipeline make sense to have accelerated by units vs emulated on shaders/etc, and there's not some massive gain to be squeezed by just putting a handful of devs on a project for a year?

Same thing for prices too. Everyone wants to assume that AMD is just choosing to follow them in gouging or whatever. The null hypothesis is that both nvidia and AMD are subject to the same industry cost trends and can’t actually do significantly better (not like 2x perf/$ or whatever), and that nvidia is in some kind of reasonable price structure after all. People are going to find that a lot of electronics prices are going to go up in the coming years. There’s no more 1600AF for $85 or 3600 for $160 either, or Radeon 7850 for $150 etc.

Not talking about original 4080 pricing etc but actually 4070 and 4060 are fairly reasonable products, and 4070 quickly fell even further below msrp. 7800xt and 7900xt and 7600xt are all fine as well. That’s about what the price increases have been since the last leasing-edge products.