Can you do GPU -> NPU -> GPU for streaming workloads? The GPU can be more flexible than Tensor HW for preprocessing, light branching, etc.

Also, Strix Halo NPU is 50 TOPS. The desktop RDNA 4 chips are into the 100s.

As for consumer uses, I mentioned it's an open question. Blender? FFmpeg? Database queries? Audio?