Hacker News

35b A3b runs ~100 tokens a second on the best M5 Max gpu setup.

I got around 50-60 on my m3 max so 100tps seems very realistic for 2 gens later of chip and double the ram