My impression is that with the latest round of high-profile releases, the open-weight "market" is coalescing around two players, DS4 Flash for speed and GLM 5.2 for smarts. Qwen is being left behind to pick up the scraps for the terminally GPU-poor.

We know they have what it takes to fight back, and they know it... so I agree, there's no reason not be optimistic about future Qwen releases. But then I've never really understood what motivates these releases in the first place.

DeepSeek V4 Pro seems to have significantly lower overhead than GLM 5.2 for the same context size. If the two are about equally smart, that's not a very good look for GLM. E.g. the KV-cache storage for GLM at full context is significantly larger, which directly impacts the effectiveness of batching on memory-constrained hardware. Keep in mind that the existing DeepSeek Pro is a preview model, we might be about to see further iterations of it being released. Hopefully the GLM folks will pick up these techniques for GLM 6 or something, the model itself is quite nice after all. It's just noticeably harder to run on limited local platforms.

Qwen still have the best models that actually run on a laptop - Gemma 4 is their best competition there.

That's only really true if one ignores the possibility of SSD offloading, which effectively opens up inference with far larger models. It's possible that the combination of batched inference and SSD streaming may be even more effective, though only for selected models with especially efficient KV storage, or perhaps very small inference contexts.