A very small model is going to be, what, 8GB? That'll easily blow through the caches. You're going to end up bottlenecked on DRAM either way.

So, I wonder if this is going to be any faster than the previous generation for edge AI.

Perhaps instead of posting erroneous assertions to HN you could wander over to your LLM of choice and ask it something along the lines of: What are some examples of edge AI applications that achieve good performance on a CPU where memory bandwidth is severely limited compared to a GPU? Please link to publicly available models where possible.