Qwen3.6-35b-a3b at 64k context runs quite well on my 12GB VRAM GPU with MoE partially offloaded to CPU. It does use a good chunk of system RAM too, but I get about 40-50 tok/s.
Qwen3.6-35b-a3b at 64k context runs quite well on my 12GB VRAM GPU with MoE partially offloaded to CPU. It does use a good chunk of system RAM too, but I get about 40-50 tok/s.