You can run the NV4FP quant with 8x RTX6000 cards at 50-75 tps output, but not (practically speaking) the OEM FP8 version. You will learn more about PCIe than you ever wanted to know.
The real gangstas are running 16x RTX6000s. Too rich for my blood, and the NV4FP quant doesn't seem to be that much worse.
Anyone done any benchmarks on the NV4FP quant? Seriously considering pitching an 8 x RTX 6000 Pro box at work to run GLM-5.2 in an air gapped environment.
At that price point you could also go with a Tenstorrent Galaxy Blackhole, which starts at $110,000.
Ooh, I hadn't seen these yet! That looks quite compelling, my only hesitancy would be what the software support looks like. But 1 TB of memory for $110k is really intriguing - I might go bother a sales rep. Thanks!
Good luck. I’m in the legal field, and even there, selling airgapped is tough.
What are the challenges you've seen in selling air gapped? Is it the high upfront cost? Challenges with hardware maintenance or something else?
We already use AWS. Everyone else is using AWS, so if there's an issue we can just say we were following industry standards.
My issue is we likely can't use AWS (non-US, CLOUD Act concerns + export control concerns).