Hacker News

It isn't 100% clear, but what quantization were you using for each? I've had worse results with MLX 8bit than what you get with Q4 GGUF, same model, seems mxfp8 or bf16 is needed when ran with MLX to get something worthwhile out of them, but I've done very little testing, could have been something specific with the model I was testing at the time.