And while it is stupid slow, you can run models of hard drive or swap space. You wouldn’t do it normally, but it can be done to check an answer in one model versus another.
Try a software called TG Pro lets you override fan settings, Apple likes to let your Mac burn in an inferno before the fans kick in. It gives me more consistent throughput. I have less RAM than you and I can run some smaller models just fine, with reasonable performance. GPT20b was one.
Tokens per second is abysmal no matter how much ram you have
Some models run worse than others but I have gotten reasonable performance on my M4 Pro with 24 GB of RAM
This is typically true.
And while it is stupid slow, you can run models of hard drive or swap space. You wouldn’t do it normally, but it can be done to check an answer in one model versus another.
48 GB MacBook Pro. All of the models I've tried have been slow and also offered terrible results.
Try a software called TG Pro lets you override fan settings, Apple likes to let your Mac burn in an inferno before the fans kick in. It gives me more consistent throughput. I have less RAM than you and I can run some smaller models just fine, with reasonable performance. GPT20b was one.