70B dense models are way behind SOTA. Even the aforementioned Kimi 2.5 has fewer active parameters than that, and then quantized at int4. We're at a point where some near-frontier models may run out of the box on Mac Mini-grade hardware, with perhaps no real need to even upgrade to the Mac Studio.

>may

I'm completely over these hypotheticals and 'testing grade'.

I know Nvidia VRAM works, not some marketing about 'integrated ram'. Heck look at /r/locallama/ There is a reason its entirely Nvidia.

> Heck look at /r/locallama/ There is a reason its entirely Nvidia.

That's simply not true. NVidia may be relatively popular, but people use all sorts of hardware there. Just a random couple of recent self-reported hardware from comments:

- https://www.reddit.com/r/LocalLLaMA/comments/1qw15gl/comment...

- https://www.reddit.com/r/LocalLLaMA/comments/1qw0ogw/analysi...

- https://www.reddit.com/r/LocalLLaMA/comments/1qvwi21/need_he...

- https://www.reddit.com/r/LocalLLaMA/comments/1qvvf8y/demysti...

I specifically mentioned "hypotheticals and 'testing grade'."

Then you sent over links describing such.

In real world use, Nvidia is probably over 90%.

r/locallamma/ is not entirely Nvidia.

You have a point that at scale everybody except maybe Google is using Nvidia. But r/locallama is not your evidence of that, unless you apply your priors, filter out all the hardware that don't fit your so called "hypotheticals and 'testing grade'" criteria, and engage in circular logic.

PS: In fact locallamma does not even cover your "real world use". Most mentions of Nvidia are people who have older GPUs eg. 3090s lying around, or are looking at the Chinese VRAM mods to allow them run larger models. Nobody is discussing how to run a cluster of H200s there.

Mmmm, not really. I have both a4x 3090 box and a Mac m1 with 64 gb. I find that the Mac performs about the same as a 2x 3090. That’s nothing stellar, but you can run 70b models at decent quants with moderate context windows. Definitely useful for a lot of stuff.

>quants

>moderate context windows

Really had to modify the problem to make it seem equal? Not that quants are that bad, but the context windows thing is the difference between useful and not useful.