Hacker News

>But for matrix multiplication, isn't compute more important, as there are N³ multiplications but just N² numbers in a matrix?

Being able to quickly calculate a dumb or unreliable result because you're VRAM starved is not very useful for most scenarios. To run capable models you need VRAM, so high VRAM and lower compute is usually more useful than the inverse (a lot of both is even better, but you need a lot of money and power for that).

Even in this post with four RPis, the Qwen3 30 A3B is still an MOE model and not a dense model. It runs fast with only 3B active parameters and can be parallelized across computers but it's much less capable than a dense 30B model running on a single GPU.

> Also I don't think power consumption is important for AI. Typically you do AI at home or in the office where there is lot of electricity.

Depends on what scale you're discussing. If you want to get similar VRAM as a 512GB Mac Studio Ultra with a bunch of Nvidia GPUs like RTX 3090 cards you're not going to be able to run that on a typical American 15 AMP circuits, you'll trip a breaker half way there.