It is, but it's way too slow on a Strix Halo due to its limited bandwidth.
(I'm still sad that they didn't make a 122B-A10B version of it, as it's the kind of model that fits best on a Strix Halo, and for 3.5 it was comparable in performance to the dense 27B version).
Yeah the speed is vastly different but it's getting ~10 tps. And the MoE model is like 50 or something.
Might use it if it proves to be much smarter I dont really monitor my agents while they're running.
Interesting. I wonder if there is opportunity to train a set of small model variants to excel at a certain stacks. Eg Qwen3.6-27B for Node + React or Qwen3.6-27B for Rust + TUI
This is always how I've imagined small/consumer-hardware models going in time. If I only ever code in Python, give me a model that does just that (plus some general CS, algorithms, structure, etc.) and does it super-fast and well. Make it small enough that if I need a Python back end and an HTML front end, another specific model can load alongside and collaborate on the front end.
Or give me a pure shopping model that has a general understanding of products and product categories, and then will playwright/scrape/API into shopping sites to compare options and find me what I want. Etc.
Qwen 3.6 27B is an anomalously strong all-around model for its size, but when we run our evaluations, we generate 10 coding submissions/language/model (110 total). So full discosure, the per-language per-model performances can be noisy (I do not think Qwen3.6 27B is better than Fable 5 in agentic workflows when writing Kotlin, given enough samples, although we do find some interesting anomalies that hold up under large sample sizes).
Off the top of my head since it seems to be the quick info you're looking for: IIRC, with these two, the 27B is a dense model, meaning it's all active at inference. Meanwhile, the 35B is a Mixture of Experts (MoE), so only part of its network (3B?) is active at any time.
Thanks! Dense models have been slow on my compute, but I'll give it a try. If its not toooooo slow then it's fine I mostly fire and forget agents anyway.
Edit: seems fast! I'll try it out some more, thanks again.
It is, but it's way too slow on a Strix Halo due to its limited bandwidth.
(I'm still sad that they didn't make a 122B-A10B version of it, as it's the kind of model that fits best on a Strix Halo, and for 3.5 it was comparable in performance to the dense 27B version).
Yeah the speed is vastly different but it's getting ~10 tps. And the MoE model is like 50 or something. Might use it if it proves to be much smarter I dont really monitor my agents while they're running.
At least according to gertlabs, Qwen3.6 27B outperforms every SoTA (closed) model at Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...
Interesting. I wonder if there is opportunity to train a set of small model variants to excel at a certain stacks. Eg Qwen3.6-27B for Node + React or Qwen3.6-27B for Rust + TUI
This is always how I've imagined small/consumer-hardware models going in time. If I only ever code in Python, give me a model that does just that (plus some general CS, algorithms, structure, etc.) and does it super-fast and well. Make it small enough that if I need a Python back end and an HTML front end, another specific model can load alongside and collaborate on the front end.
Or give me a pure shopping model that has a general understanding of products and product categories, and then will playwright/scrape/API into shopping sites to compare options and find me what I want. Etc.
Qwen 3.6 27B is an anomalously strong all-around model for its size, but when we run our evaluations, we generate 10 coding submissions/language/model (110 total). So full discosure, the per-language per-model performances can be noisy (I do not think Qwen3.6 27B is better than Fable 5 in agentic workflows when writing Kotlin, given enough samples, although we do find some interesting anomalies that hold up under large sample sizes).
Hmm, I just assumed bigger was better. How's it different?
Off the top of my head since it seems to be the quick info you're looking for: IIRC, with these two, the 27B is a dense model, meaning it's all active at inference. Meanwhile, the 35B is a Mixture of Experts (MoE), so only part of its network (3B?) is active at any time.
Thanks! Dense models have been slow on my compute, but I'll give it a try. If its not toooooo slow then it's fine I mostly fire and forget agents anyway.
Edit: seems fast! I'll try it out some more, thanks again.