I get ~55 Tok/s on my framework desktop with the 35B A3B q8 model, and so far am also very happy with the coding performance.

did you upgrade to MTP?

On the MoE versions of these models the MTP versions have only marginal benefit. In my trials the speed-up is <20% (not the ~2x that happens with some other setup/models) and usually more like 10%. Ie. something like 13 -> 15 token/s... on my device.

I still use the MTP version as it _feels_ slightly better quality, and because the unsloth quantizations I can get have more variety to fit into the various systems at hand... but that's not for the MTP aspect, unfortunately.

In the article they did have ~2x performance on the 27B (which might be something to retry, though on my Framework that would bring it from 5 -> 10 token/s so still "excrutiating" speed, probably).

YMMV for sure.

That was with the MTP version