Hacker News

thot_experiment an hour ago [ - ]

I haven't tried this model yet, but I can run Gemma 31B w/ the MTP drafter in pure CPU at about 10tok/s so this should run at about 20-30tok/s on a decent CPU, it'll probably run at >50tok/s on any Mac that can fit it, and lots of people have a gaming GPU with enough VRAM. In terms of access to hardware being a gate, it's one you can hop pretty easily.

dofm an hour ago [ - ]

Could you outline how you are running the MTP drafters? I've tried LM Studio but no dice there. I'm probably missing something but I think llama.cpp and Ollama can't do it yet either?

Patrick_Devine 31 minutes ago [ - ]

I haven't yet pushed the MTP enabled gemma4 12b model for Ollama because in my testing I wasn't getting a performance bump. The other gemma4 MTP models should work OK right now, but there are some fixes we're just about to push. This is specifically for the MLX backend.

dofm 19 minutes ago [ - ]

Thanks for your reply. I will go back and look at Ollama again.

So much to learn but this news has really vindicated my decision to direct my limited span of concentration and focus to learning how to use open weights models and opencode.

ch_sm 29 minutes ago [ - ]

can‘t speak to compatibility with this new model, but oMLX supports MTP drafters very well.

dofm 21 minutes ago [ - ]

Thank you, I will test that.