The use cases that open up when inference stays on-device are genuinely different. Health apps, journaling, anything where users are (justifiably) paranoid about their data leaving the phone — that's a big surface area that cloud APIs can't really touch. Surprised this is happening at the speed it is on consumer hardware.