The latest NPUs are pretty fast, I think what is missing is more optimised software support.

The vRAM bandwidth is at least as much a problem as compute on these ones, there is a lot of data to shuffle around