Hacker News

Will 2026 M5 MacBook come with 390+GB of RAM?

Quants will push it below 256GB without completely lobotomizing it.

> without completely lobotomizing it

The question in case of quants is: will they lobotomize it beyond the point where it would be better to switch to a smaller model like GPT-OSS 120B that comes prequantized to ~60GB.

lambda 6 hours ago [ - ]

In general, quantizing down to 6 bits gives no measurable loss in performance. Down to 4 bits gives small measurable loss in performance. It starts dropping faster at 3 bits, and at 1 bit it can fall below the performance of the next smaller model in the family (where families tend to have model sizes at factors of 4 in number of parameters)

So in the same family, you can generally quantize all the way down to 2 bits before you want to drop down to the next smaller model size.

Between families, there will obviously be more variation. You really need to have evals specific to your use case if you want to compare them, as there can be quite different performance on different types of problems between model families, and because of optimizing for benchmakrs it's really helpful to have your own to really test it out.

Wowfunhappy 5 hours ago [ - ]

> In general, quantizing down to 6 bits gives no measurable loss in performance.

...this can't be literally true or no one (including e.g. OpenAI) would use > 6 bits, right?

lostmsu 4 hours ago [ - ]

Did you run say SWE Bench Verified? Where does this claim coming from? It's just an urban legend.

bertili 12 hours ago [ - ]

Most certainly not, but the Unsloth MLX fits 256GB.

embedding-shape 12 hours ago [ - ]

Curious what the prefilled and token generation speed is. Apple hardware already seem embarrassingly slow for the prefill step, and OK with the token generation, but that's with way smaller models (1/4 size), so at this size? Might fit, but guessing it might be all but usable sadly.

regularfry 10 hours ago [ - ]

They're claiming 20+tps inference on a macbook with the unsloth quant.

embedding-shape 7 hours ago [ - ]

Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.

margorczynski 11 hours ago [ - ]

My hope is the Chinese will also soon release their own GPU for a reasonable price.