Hacker News

Depends on which variant you pull down, but a single 5090 GPU (I know these are insanely expensive, but for context) could run either the Q8 or Q4_K_M version. It will not fit the 52GB version (BF16) on the other hand. So any modern Mac with a Pro or better processor and more than 52GB of RAM (don't forget VRAM for context window also matters!) would suffice, as someone else noted, probably a 128GB model would do the trick, and give you enough wiggle room to max out the context window.

My Mac only has 16GB of VRAM (20GB total - 8 is reserved for the OS) so I have to leave room for VRAM, I usually find a model that fits in 5 to 7 GB of VRAM and then max the context window as much as I can.

daemonologist 36 minutes ago [ - ]

The benefit of running the full precision version is negligible (probably not even measurable above the benchmark noise floor). Most common for cost-conscious users is to run something around 4-6 bits per weight, which would fit on a 24 or 32 GB card (as you mentioned).

pixelesque 2 hours ago [ - ]

Note you can change the amount of shared (V)RAM reserved for the OS with:

sudo sysctl iogpu.wired_limit_mb=18800

will allow you to use more, but you do need to leave a bit for the OS obviously!

giancarlostoro 2 hours ago [ - ]

Oh man! I had no idea I could do this at all! What do you usually tweak it to? I feel like 8 GB is probably still a reasonable amount to give the rest of the OS.

pixelesque an hour ago [ - ]

I've got a 32 GB MBPro, and I set it to 27700, which I haven't seen a problem with so far.