You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memory and will blow away any mac in terms of performance on LLM workloads. They have gone up in price lately and are now about $1100 each, but at one point they were $700-800 each.
I don't see how 5x 3090's is a better option than an M3 Ultra Mac studio.
The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
> You can certainly daisy chain several 3090's together
It's not "daisy chaining" 3090 has NVLink.
FWIW I have never used NVLink, and I’m not sure why people are bringing up “daisy chaining” because as far as I’m aware that is not a thing with modern GPUs at all.
Really? How would you NVLink more than 2 3090's?
> The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.
> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.
I think it's bad form to say "citation needed" when your original claim didn't include citations.
Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.
> I think it's bad form to say "citation needed" when your original claim didn't include citations.
I apologize, but using multiple GPUs for inference (without any sort of “daisy chaining”) is something that’s been supported in most LLM tooling for a long time.
> Regardless - there's a difference between training and inference.
No one brought up training vs. inference to my knowledge, besides you —- I was assuming the machine was for inference, because my experience building a machine like the one I described was in order to do inference. If you want to train models, I know less about that, but I’m pretty sure the tooling does easily support multiple GPUs
> And pytorch doesn't magically make 5 gpus behave like 1 gpu.
I never said it was magic, I just said it was supported, which it is.
How much does it cost to have an electrician wire up 240v circuit just to power the thing?
The machine I’m describing works just fine on a dedicated 15A 120V circuit.