Hacker News

terhechte 4 days ago [ - ]

You can run the 4bit quantized version of it on a M3 Ultra 512GB. That's quite expensive though. Another alternative is a fast CPU with 500GB of DDR5 RAM. That of course, is also not cheap and slower than the M3 Ultra. Or, you buy multiple Nvidia cards to reach ~500GB of VRam. That is probably the most expensive option but also the fastest

lodovic 3 days ago [ - ]

If you use the excess memory for AI only it's cheaper to rent . A single H100 costs less than $2 per hour. (incl power)

diggan 3 days ago [ - ]

Vast.ai has a bunch of 1x H100 SXM available, right now the cheapest at $1.554/hr.

Not affiliated, just a (mostly) happy user, although don't trust the bandwidth numbers, lots of variance (not surprising though, it is a user-to-user marketplace).

qingcharles 3 days ago [ - ]

Every time someone asks me what hardware to buy to run these at home I show them how many thousands of hours at vast.ai you could get for the same cost.

I don't even know how these Vast servers make money because there is no way you can ever pay off your hardware from the pennies you're getting.

omneity 3 days ago [ - ]

Worth mentioning that a single H100 (80-96GB) is not enough to run R1. You're looking at 6-8 GPUs on the lower end, and factor in the setup and download time.

An alternative is to use serverless GPU or LLM providers which abstract some of this for you, albeit at a higher cost and slow starts when you first use your model for some time.

zackangelo 3 days ago [ - ]

Yeah, to run the full precision model you need either two 8xH100 nodes connected via Infiniband or one 8xH200 node or one 8xB200 node.

Not for the GPU poor, to be sure.

girvo 3 days ago [ - ]

It is enough to run the dynamically quantised 1.56 bit version I believe, which is fun to play around with.