I agree with you in certain circumstances, but not really for internal user inference. OpenRouter is great if you need to maintain uptime, but for basic usage (chat/coding/self-agents) you can do all of what you mentioned and more with a LiteLLM instance. The number of companies that send a bill is rarely a concern when it comes to “is work getting done”, but I agree with you that minimizing user friction is best.
For general use, I personally don’t see much justification as to why I would want to pay a per-token fee just to not create a few accounts with my trusted providers and add them to an instance for users. It is transparent to users beyond them having a single internal API key (or multiple if you want to track specific app usage) for all the models they have access to, with limits and logging. They wouldn’t even need to know what provider is hosting the model and the underlying provider could be swapped without users knowing.
It is certainly easier to pay a fee per token on a small scale and not have to run an instance, so less technical users could definitely find advantage in just sticking with OpenRouter.
The two things I like about OpenRouter:
1. The LLM provider doesn't know it's you (unless you have personally identifiable information in your queries). If N people are accessing GPT-5.x using OpenRouter, OpenAI can't distinguish the people. It doesn't know if 1 person made all those requests, or N.
2. The ability to ensure your traffic is routed only to providers that claim not to log your inputs (not even for security purposes): https://openrouter.ai/docs/guides/routing/provider-selection...
It's been forever since I played with LiteLLM. Can I get these with it?
> It doesn't know if 1 person made all those requests, or N.
FWIW this is highly unlikely to be true.
It's true that the upstream provider won't know it's _you_ per se, but most LLM providers strongly encourage proxies like OpenRouter to distinguish between downstream clients for security and performance reasons.
For example:
- https://developers.openai.com/api/docs/guides/safety-best-pr...
- https://developers.openai.com/api/docs/guides/prompt-caching...
Fair point. Would be good to hear from OpenRouter folks on how they handle the safety identifier.
For prompt caching, they already say they permit it, and do not consider it "logging" (i.e. if you have zero retention turned on, it will still go to providers who do prompt caching).
OpenRouter tells you if they submit with your user ID or anonymously if you hover over one of the icons on the provider, eg OpenAI has "OpenRouter submits API requests to this provider with an anonymous user ID.", Azure OpenAI on the other hand has "OpenRouter submits API requests to this provider anonymously.".
But does "anonymous user ID" mean that they make a user ID for you, and it's sticky? If I make a request today and another tomorrow, the same anonymous user ID is sent each time? Or do they keep changing it?
One additional major benefit of OpenRouter is that there is no rate limiting. This is the primary reason why we went with OpenRouter because of the tight rate limiting with the native providers.
I think it's more accurate to say that they switch providers when there is rate limiting.
The underlying provider can still limit rates. What Openrouter provides is automatic switching between providers for the same model.
(I could be wrong.)
> The number of companies that send a bill is rarely a concern
Not true in any non startup where there is an actual finance department
A lot of inference providers for open models only accept prepaid payments, and managing multiple of those accounts is kind of cumbersome. I could limit myself to a smaller set of providers, but then I'm probably overpaying by more than the 5.5% fee
If you're only using flagship model providers then openrouter's value add is a lot more limited
The main thing about Openrouter is also that they take 100% of the risk in case of overcharges from the models, you have an actual hard cap.
The minus is that context caching is only moderately working at best, rendering all savings nearly useless.
Is there any risk? Don't the model providers also bill by the token?
The accounting could be asynchronous, so you could overshoot your budget by a few requests before you're blocked.
Does OpenRouter perform better than LiteLLM on integration though? I found using Anthropic's models through a LiteLLM-laundered OpenAI-style API to perform noticably worse than using Anthropic's API directly. So I've scrapped considering LiteLLM as an option. It's also just a buggy mess from trying to use their MCP server. The errors it puts out are meaningless, and the UI behaves oddly even in the happy path (error message colored green with Success: prepended).
But if OpenRouter does better (even though it's the same sort of API layer) maybe it's worth it?
LiteLLM had a major security incident recently, and often isn't actually that useful an abstraction...