Tbf the Sparks usefulness isn’t for inference IMO. Its memory bandwidth is too low for that.
But on the other hand, running Qwen 3.5 122B A10B locally on it using ~110GB of memory and getting 50tk/s generation and quite excellent prefill… I couldn’t do that on many other machines at this price point
For me this has been awesome to learn CUDA on, fine tuning models (until I get it close to what I want then it’s off to H100 or something clusters) and a bit of inference on the side