We need LLM query routing at the OS level like Mobile data. I know it will sound crazy but hear me out. I think about this AI inference as infrastructure. I do not want to pay for it on every app I use it on. I do not think "I have to pay the mobile data of youtube, and the mobile data of whatsapp etc.". I pay Mobile data infrastructure and let my device route it appropiately. In fact, if we ever go the local llm route, you could have LLM capabilities without having access to the internet (or local LAN), and your OS/computer is the only one capable of doing that routing for you.

It doesn't sound crazy at all, this seems almost obvious. The OS should provide a chat completions server and the user should be able to select the underlying LLM's server. This should be just like selecting a default search engine or browser.

Hopefully the EU forces US tech giants to do this. God knows Apple and Google won't do this on their own. They gotta get that sweet default provider revenue.

Apple told EU Citizens thats why they cant have Siri on their Phone as AI; they would have to provide an Interface where you can plug in your own LLM of your chosing.

> It doesn't sound crazy at all, this seems almost obvious. The OS should provide a chat completions server and the user should be able to select the underlying LLM's server. This should be just like selecting a default search engine or browser.

I wonder why this hasn’t happened yet. If Microsoft wants to have a Copilot button and AI investments are all the rage now, surely anything to make integrating with them would be good for keeping the hype cycle alive for longer?

Because it’s still pretty early in this journey and there’s some exclusivity deals to be made.

Honestly I don't get the point but if you want to explore that, both on desktop, mobile or headless server Linux allows you to try it.

You can run ollama with whatever you want on a Debian in literally minutes. You can even do that within a virtual machine using e.g. QEMU, so that you can do all the tests you need risk free.

Again I don't understand what that would enable that can't be done today but it's perfectly fine, you can try today anyway, no need to ask permission to anyone.

No, what I am saying simply does not exist yet.

I am saying I want my OS to expose APIs like it does for the disk or the network for AI. And I want my apps to be able to use those APIs.

I want my backend LLMs to be able to change on a whim. Imagine an Android app consuming from these LLMs. Maybe I am outside and it is making queries to Gemini. And maybe I get home and now it makes queries to my local llm, almost like connecting to local Wifi.

What I am saying does not exist on many levels:

- Agreed upon APIs for this don't think exist (in text maybe, but not in image/sound/video).

- OSs do not expose this (I am not talking manually configured user space stuff here).

- I see a world where your Network provider bundles "calls + data plan + AI tokens". But not only are the offerings for these not standardized, in order to even reach that point we would need to standardize the offerings. How do you compare intelligence among models? How do you compare cost?

- The apps need to start adopting this model

The tech is here, the ecosystem is not.

Well… it doesn’t exist FROM APPLE or MICROSOFT or GOOGLE at their shipped OS Level, but… fundamentally this isn’t a “true OS” level feature you’re asking for, it’s something you think the OS products should bake in, and you might be right! But I think the parents post is suggesting YOU CAN BUILD a prototype of what you want, how it should work, on Linux…

I have a project somewhat close to this I’ve put on pause the last month or so, partly because I’m not sure how useful it is or where to take next, but I may incorporate Wayfinder into it as a next step to improve its capabilities, as part of what it is a model gateway/router that this feels like could make more powerful/flexible in its decision making. I can’t decide if what I’m building is mostly a model recipe cookbook/platform, or a debugging tool, or both or something else at the moment, but, it can do most of that… maybe it’s part of what you want, if you figure that out better? feedback welcome! https://wardwright.dev/ https://github.com/bglusman/wardwright

> Well… it doesn’t exist FROM APPLE or MICROSOFT or GOOGLE at their shipped OS

What I am saying does not exist period. What I am saying is that there isn't a proper abstraction that helps the ecosystem build upon it.

> But I think the parents post is suggesting YOU CAN BUILD a prototype of what you want, how it should work, on Linux

I mean, yes. But me saying "this does not exist" and someone saying "but you can build it" does not take away from the fact that... Yeah, it doesn't exit :).

And also, no, I cannot build it, at least not alone. Because I want apps to eventually build upon my abstractions. This would require a good set of millions, of which the technical development would be a small part. The coordination, contracts, API definitions, even marketing, etc would be the majority.

I am saying something that Google, Telefonica, Microsoft etc could do.

Exactly. That's why I built the role-model protocol, the pi-role-model extension so that Pi can tell the router where its requests should go, and the reference router implementation: https://news.ycombinator.com/item?id=48706181

Why do we need API endpoints? We have the best API there is - the CLI

Querying a CLI is also querying an API. I never said API endpoint. An API can be a Java Interface, a CLI, an endpoint etc.

I mean, the reason mobile data is part of the OS is because the antenna is hardware that must be shared across processes. Chat completions is just a network call like anything else—it’s already available to every app; they don’t need to pay separately (they can use the same account), they just pass their API key over the network to the completions server. What am I missing?

> Chat completions is just a network call like anything else

But what if Chat completion was resolved locally with hardware? Or what if I want my OS to coordinate Chat completions locally and, if my hardware is overwhelmed, send some to network?

You do have a valid point, yes, that what I am saying, without support for local hardware could be done with a sort of Open Router equivalent.

> they don’t need to pay separately (they can use the same account), they just pass their API key over the network to the completions server

That I would be conformable putting what I am saying on my parents phone. I do no trust my parents to manage API keys. What I am saying is an ecosystem thing, not only a low level thing