Hacker News

For the bit less technical:

Assuming aggressive but non-breaking quantization, for LLMs, you need ~ 1 byte per parameter to run. I’ll optimistically assume diffusion models are at least as fast/quantizable (has anyone here tried?)

So, a 20b parameter model fits in 20GB of ram. At those sizes, iGPUs are fine, so we’re talking a sub-$500 box.

You can push things up to 128GB of unified ram for about $2200 with an AMD SoC.

If you think that’s expensive, don’t even think about buying GPUs or getting more than 128GB on an iGPU (apple makes one if money is no object)