I don't believe Anthropic and OpenAI are any more fearful of local AI than Google or Microsoft are of people hosting their own email.
Local AI capabilities are growing at a rapid pace, but so is hosted AI. While you can do a surprising amount of useful work with a model occupying a few to a few hundred gigs of VRAM, the hosted models are going to be way ahead for a long time.
The fundamental difference is that email you host yourself requires ongoing maintenance and expertise to work at a basic level, and people would rather outsource it.
AI inference is different. You get the outcome by passing text through some weights at the time you need it. There's no ongoing work besides training and releasing new models. If I had something that rivalled Opus 4+ I could use locally, I would switch in a heartbeat.
If it's something like:
- v4.5: 1x cost, 100% quality, 100% speed but maybe sometimes 80% speed because of load - v4.6: 3x cost, 105% quality, 80% speed most of the time depends - v4.7: 9x cost, 115% quality, 90% speed most of the time
Then people will either stick with v4.5 for everything it can do and, if knowledgeable, use v4.7+ for critical or specific tasks.
But if we add the option of:
LocalLLM: one time hardware + electricity cost, good enough quality for 90% of work, good enough speed for 90% of work, no vendor lock in/sudden cost spikes...
Then there is an edge to running it yourself unless you can burn investor cash to get to the next level.
I think the recent headlines on org token spend plus my own experience just today (June 1) with the new Copilot Pro limits is going to push those with the compute to run locally.
As of about 1pm today I did something to hit 47% of my entire June premium requests (copilot Pro, not converted).
As of 2pm I'm using Gemma 4 E4B on a 12gb GPU (with large context window) off my desktop to power VS Code with Copilot on my laptop. I'm going to build an AMD Strix Halo system next week when parts arrive so I can queue up a few models in parallel or work with something I need that much RAM for.
I'm not lifting the earth with my LLM setup. Gemma 4 E4B is solid for accelerating my current projects. and it's costing me pennies more per hour vs blowing half my Copilot Pro plan in a distracted morning.
I'm at a vendor conference this weekend that is showing off their Agent/Agentic workflows. Nobody can tell me how they balance the cost long term. Hopefully whoever the vendor is paying for their cloud LLM token usage doesn't spike cost in a year (or the vendor themselves) after companies convert and are trapped VMware style with these agent processes. You can bring your own (cloud) model subscription. I need to find out if we can point it back to our own local LLM endpoint and try local models for the same processes. Even if it takes 5x longer, it could be cheaper and more secure.
I fear the same thing, but still am unsure why or how :)
Google/Microsoft and hosting your own email is a byproduct of how difficult (socially, not technically) hosting your own email has become - mostly because SMTP protocol is inherently broken by spam and patched by social construct (trusted nodes, abuse@, 3+ DNS entries and counting, etc). Purely technical solutions, such HashCash etc, got discontinued in exchange for social ones. Central providers made (sometimes in exchange for, sometimes as excuse of, spam protection) self-hosting socially hard.
Now, I wonder if, and how, once Anthropic and OpenAI need to demonstrate profitability, could hamstring local AI. Which has been /so far/ very valuable for me in doing things that hosted providers don't want liability for, and align against (even if totally lawful and fair use!).