$500k is a vast overestimation. For massive concurrency at FP8 or even BF16 maybe.
NVFP4 at reasonable speeds (~120 tok/s) and concurrency is possible at a $80/90k figure with today's prices, maybe even less. That buys you 6 RTX 6000 PRO Blackwells, a decent CPU and motherboard, power supply. 576gb of VRAM.
You could do it for under $50k if you're OK with 40 tok/s decode, ~1200 tok/s prefill.
Yes, a single GB300 workstation also does it, probably even more than 120tok/s.
Official price 85k...
Actual price $100k and everything is very closed and proprietary. Oddly this MSI system provides "only" 252G vram and 500G ram. I would have expected more vram for this price. Also why 252 instead of 256? https://www.centralcomputer.com/msi-xpertstation-ws300-ai-wo...
How fast will the hardware become outdated? Are there big improvements expected in the next 3 years?
M5 Ultra will ship before end of year, likely. Though with current RAM shortage, likely max spec will be 256GB and in short supply.
In late 2027 or early 2028, Nvidia will release Vera Rubin DGX Spark, likely with double or better the performance of current Blackwell, though unclear if memory capacity will go up much from current 128GB. Two to four of those will run models like this decently.
In 2028 we should expect Vera Rubin RTX discrete lineup, including the replacement to the RTX PRO 6000. Likely memory spec will be minimum 128GB. Good chance of up to 200GB. Two to four of those will run NVFP4 models in this class very well.
It might be M6 Ultra and I think the real reason for stopping selling top-tier units was to avoid mid-generation price hikes and increasing demand for the more expensive next-gen systems that I assume will come with 512gb (maybe 1TB) of RAM and a massive markup to match.
I hope all this speculation comes true. Right now this ram crunch is ridiculous.
I feel like the models are good enough for a decade of future work. So Once you have a working set up you can keep using it to do the work at the same level. There will be better stuff and may make that type of work obsolete but if you can do useful things it won’t be worth less.
I think there is a gap right now for running large models such as GLM 5.2 in Q4 or Q8. My hope is on Intel Crescent Island 480GB cards. Let‘s see how expensive they‘ll be.
480GB? Probably like 100k$ each? :D
P40 was release 2016 and still selling like hotcakes!
You can get a 1TB of HBM2 vram for like 10k, https://www.ebay.com/itm/177571378959
The problem is the backplane I have not managed to find a single baseboard, and getting a random baseboard to work with random modules is probably a crap shoot.
[dead]