I've seen plenty of people on HN claim that LLM's running on their phones is the obvious future in terms of not just privacy but also efficiency, i.e. better along every possible metric.

They don't usually go into much detail, but the impression I get is that they think data centers are energy monsters full of overheated GPU's that need to be constantly replaced, while your phone is full of mostly unused compute capacity and will barely break a sweat if it's only serving queries for a single user at a time.

They don't seem to give much thought to the energy usage per user (or what this will potentially do to your phone battery), or how different phone-sized vs data center-sized models are in terms of capability.