Yeah, I wouldn’t complain if one dropped in my lap, but they’re not at the top of my list for inference hardware.
Although... Is it possible to pair a fast GPU with one? Right now my inference setup for large MoE LLMs has shared experts in system memory, with KV cache and dense parts on a GPU, and a Spark would do a better job of handling the experts than my PC, if only it could talk to a fast GPU.
[edit] Oof, I forgot these have only 128GB of RAM. I take it all back, I still don’t find them compelling.
They’re very slow.
They're okay, generally, but slow for the price. You're more paying for the ConnectX-7 networking than inference performance.
Yeah, I wouldn’t complain if one dropped in my lap, but they’re not at the top of my list for inference hardware.
Although... Is it possible to pair a fast GPU with one? Right now my inference setup for large MoE LLMs has shared experts in system memory, with KV cache and dense parts on a GPU, and a Spark would do a better job of handling the experts than my PC, if only it could talk to a fast GPU.
[edit] Oof, I forgot these have only 128GB of RAM. I take it all back, I still don’t find them compelling.