Now that I’m dipping more into this space, am gonna see what I can upgrade with the motherboard I have, but RAM pricing as it is, I’ll need to be smart about when I upgrade.

I very much appreciate the frank response, as it makes me feel less defeated at knowing my understanding of how it should work is not the full issue, hahaha

M series macs are usually used for running these LLMs locally because the GPU and CPU share the same pool of RAM at very low latency. If you upgrade your RAM on a different kind of chipset without the Unified Memory Architecture, then it'll be much slower to produce all the tokens you need. Just another data point to add to your upgrade equation.