> I do think Macs are phenomenal at running local LLMs if you get the right one.

How does the prompt processing speed look like today? I think it was either M3 or M4 together with 128GB, trying to run even slightly longer prompts took forever for the initial prompt processing so whatever speed gain you get at inference, basically didn't matter. Maybe it works better today?

I have only ever used the M4 (on my wife's Macbook Air) and M4 Pro (on my Macbook Pro) and it was reasonable speeds, I was able to tie LM Studio with PyCharm and ask it questions about code, but my context Window kept running out, I don't think the 24GB model is the right choice, the key thing you have to also look out for is for example I might hvae 24GB of RAM, but only 16 of it can be used as VRAM, so I'm more competitive than my 3080 in terms of VRAM, though my 3080 could probably run circles around my M4 Pro if it wanted to.