Hacker News

satvikpendem 17 hours ago [ - ]

Qwen 3.6 27B dense is much better than the 35B MoE model for coding, not sure if you've tried that yet.

sheeshkebab 3 hours ago [ - ]

27b is slow as molasses vs 35b on local stuff I have (m5 max). Mtp doesn’t make any difference either.

walrus01 17 hours ago [ - ]

yes, I have, I use both. 27B slower in tok/s due to density, obviously, 35B-A3B for speed on simpler tasks.

intothemild 9 hours ago [ - ]

You should enable MTP now that its available.

LLamaCPP has had some massive updates in the last week or so.

npodbielski 5 hours ago [ - ]

Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.

alfiedotwtf an hour ago [ - ]

For those speeds, I’m assuming Q4?

intothemild 5 hours ago [ - ]

I get 50-60t/s tg on my r9700 with the dense, unsloth MTP quant UD-Q5_K_XL, K@8/V@4 256k context.

Using Vulkan backend.

``` llama-server -fa on -t 7 -ngl 999 --mlock --fit off --kv-offload --no-webui --metrics --chat-template-kwargs {"preserve_thinking": true} -b 2048 -ub 1024 -m /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q5_K_XL.gguf --mmproj /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf -c 262144 --kv-unified -ctk q8_0 -ctv q4_0 --spec-type draft-mtp --spec-draft-n-max 3 --spec-draft-ngl 99 --alias unsloth/Qwen3.6-27B-MTP-GGUF --temp 0.60 --top-k 20 --top-p 0.95 --min-p 0.00 --presence-penalty 0.00 --repeat-penalty 1.00 ```