I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912
The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.
The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.
You're not counting the capex which could be the same cost as 5-10 years of Claude.
Is latency of the network that noticeable? Aren’t we talking low hundreds of ms at worst here? Much lower for something close regionally.