Apple absolutely has a massive opportunity here because they used a shared memory architecture.
So as most people in or adjacent to the AI space know, NVidia gatekeeps their best GPUs with the most memory by making them eye-wateringly expensive. It's a form of market segmentation. So consumer GPUs top out at 16GB (5090 currently) while the best AI GPUs (H200?) is 141GB (I just had to search)? I think the previou sgen was 80GB.
But these GPUs are north of $30k.
Now the Mac Studio tops out currently at 512GB os SHARED memory. That means you can potentially run a much larger model locally without distributing it across machines. Currently that retails at $9500 but that's relatively cheap, in comparison.
But, as it stands now, the best Apple chips have significantly lower memory bandwidth than NVidia GPUs and that really impacts tokens/second.
So I've been waiting to see if Apple will realize this and address it in the next generation of Mac Studios (and, to a lesser extend, Macbook Pros). The H200 seems to be 4.8TB/s. IIRC the 5090 is ~1.8TB/s. The best Apple is (IIRC) 819GB/s on the M3 Ultra.
Apple could really make a dent in NVidia's monopoly here if they address some of these technical limitations.
So I just checked the memory bandwidth of these new chips and it seems like the M5 is 153GB/s, M5 Pro is ~300 and M5 Max is ~600. I was hoping for higher. This isn't a big jump from the M4 generation. I suspect the new Studios will probably barely break 1TB/s. I had been hoping for higher.
>So consumer GPUs top out at 16GB (5090 currently)
5090 has 32GB, and the 4090 and 3090 both have 24GB.
It will be interesting to see the specs on an m5 ultra. Probably have to wait until WWDC at the earliest to see it though
Hard to get 6000+ bit memory bus HBM bandwidth out of a 512 or 1024 bit memory bus tied to DDR... I think it's also just tough to physically tie in 512 gigs close enough to the GPU to run at those speeds. But yeah, I wish there was a very competitive local option, too, short of spending $50k+.