The thing that is standing in their way is probably that nobody is willing to pay for this what it costs to run.

Doesn't look very expensive to me. An LLM capable of this level of summarization can run in ~12GB of GPU-connected RAM, and only needs that while it's running a prompt.

The cheapest small LLMs (GPT-4.1 Nano, Google Gemini 1.5 Flash 8B) cost less than 1/100th of a cent per prompt because they are cheap to run.

Yes! And also, Apple loves selling expensive hardware and has zero shyness asking people to pay a few thousand bucks to buy into part of their ecosystem.

They could easily offer an on-prem family 'AI' product that you plop in your house and plug into your router, and does all AI processing for the whole family, and uses a secure VPN to connect to any of your devices outside the LAN.

If such a product delivered JUST what this guy's cool hack provides, and made Siri not a stupid piece of sh*t for my family, I'd buy it for $1999 even if I knew it cost Apple $700 to make.