Hacker News

> It's also possible to make an MLX version of it, which runs a little faster on Macs

FWIW, I found MLX variants to perform consistently worse (in terms of expected output, not speed) than GGUF in my measurements on my benchmark that matters to me (spam filtering). I used MLX models in LM Studio. GGUF was always slightly better.

Perhaps someone who knows more can pitch in and explain this.

embedding-shape 10 hours ago [ - ]

It isn't 100% clear, but what quantization were you using for each? I've had worse results with MLX 8bit than what you get with Q4 GGUF, same model, seems mxfp8 or bf16 is needed when ran with MLX to get something worthwhile out of them, but I've done very little testing, could have been something specific with the model I was testing at the time.

pmarreck 9 hours ago [ - ]

I was not aware of this. I might not be willing to trade accuracy for speed in this case, then.