Most ordinary home users don't care about GPU or local AI performance that much.

Right now, sure. There's a reason why chip manufacturers are adding AI pipelines, tensor processors, and 'neural cores' though. They believe that running small local models are going to be a popular feature in the future. They might be right.

It's mostly marketing gimmicks though - they aren't adding anywhere near enough compute for that future. The tensor cores in an "AI ready" laptop from a year ago are already pretty much irrelevant as far as inferencing current-generation models go.

NPU/Tensor cores are actually very useful for prompt pre-processing, or really any ML inference task that isn't strictly bandwidth limited (because you end up wasting a lot of bandwidth on padding/dequantizing data to a format that the NPU can natively work with, whereas a GPU can just do that in registers/local memory). Main issue is the limited support in current ML/AI inference frameworks.