Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.
Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.