Qwen 3.6 27B dense is much better than the 35B MoE model for coding, not sure if you've tried that yet.

27b is slow as molasses vs 35b on local stuff I have (m5 max). Mtp doesn’t make any difference either.

yes, I have, I use both. 27B slower in tok/s due to density, obviously, 35B-A3B for speed on simpler tasks.

You should enable MTP now that its available.

LLamaCPP has had some massive updates in the last week or so.

Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.

For those speeds, I’m assuming Q4?

I get 50-60t/s tg on my r9700 with the dense, unsloth MTP quant UD-Q5_K_XL, K@8/V@4 256k context.

Using Vulkan backend.

``` llama-server -fa on -t 7 -ngl 999 --mlock --fit off --kv-offload --no-webui --metrics --chat-template-kwargs {"preserve_thinking": true} -b 2048 -ub 1024 -m /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q5_K_XL.gguf --mmproj /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf -c 262144 --kv-unified -ctk q8_0 -ctv q4_0 --spec-type draft-mtp --spec-draft-n-max 3 --spec-draft-ngl 99 --alias unsloth/Qwen3.6-27B-MTP-GGUF --temp 0.60 --top-k 20 --top-p 0.95 --min-p 0.00 --presence-penalty 0.00 --repeat-penalty 1.00 ```