Hacker News

Yeah. I have not really tinkered much with parameter optimisation for the 35B model with MTP. Would be interested to see what you've found.

I'm using the GGUF too; it appears slightly faster in llama.cpp now than current LM Studio but it's not clear to me if that is down to LM Studio having a little more code overhead, older llama.cpp under the hood, or just parameter differences.