Possibly the best deal there is

I really need to shut up, or bite the bullet and by one.

If you graph the tokens per second on the 5090, your jaw will hit the floor at how cheap it is

With only 32gb of vram, you can only run small/quantized models, in which case what's the point? At $4000, that gets you 20 months of 10x claude or chagpt subscriptions, which provide far better models. You'd need some use case where you can tolerate worse models, and use a steady supply of them. That doesn't match most people's usage patterns.

If you can do what you need with qwen3.6-27b, it starts to look really interesting. That model is crazy good for the size, but it's a pain tweaking the params to run it on a 4090 with decent context and decent token speed. A 5090 looks tasty from that point of view, and only more so if you think in terms of the probability of that model being roflstomped by something in the same weight class in the next couple of years. I reckon that probability is significantly non-zero, but fundamentally it's a guess.

>If you can do what you need with qwen3.6-27b, it starts to look really interesting.

What's the use case here? Churning out massive amounts of slop code through autonomous agents? Running openclaw 24/7? I think the proliferation of codex and claude code, compared to any of the cheaper open models suggests that at least for most software development, the 50-75% discount of open models isn't worth the hassle of the decreased intelligence.

Or you want to process private data or don’t have reliable connectivity. There are a few more reasons for local models I think.

Also, electricity isn't free.

The 5090 is crap for inference. Unless you like dummy models, sure they will run at light speed. All the rage is MoE with 500B-1T weights nowadays.

MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.