> When I use OpenAI, Openrouter etc., I can put 10 $ on my API key, and when the key leaks, someone can use these 10 $ and that's it.
On that note, I'll just mention that I had discovered over the last while that when you prepay $10 into your Anthropic account, either directly, or via the newer "Extra usage" in subscription plans, and then use Claude Code, they will repeatedly overbill you, putting you into a negative balance. I actually complained and they told me that they allow the "final query" to complete rather than cutting it off mid-process, which is of course silly, because Claude Code is typically used for long sessions, where the benefit of being cut off 52% into the task rather than 51% into it is essentially meaningless.
I ended up paying for these so far, but would hope that someone with more free time sues them on it.
I'm spitballing here, but I suspect that (same with AWS) google uses post processing for billing, they run a job that scrapes the states THEN bills you for that. instead of the major AI companies are checking billing every API request coming in.
Yes, you are on the money. A cloud service provider needs to maintain reliability first and foremost, which means they won't have a runtime dependency on their billing system.
This means that billing happens asynchronously. You may use queues, you may do batching, etc. But you won't have a realtime view of the costs
>they won't have a runtime dependency on their billing system
Well, that makes sense in principle, but they obviously do have some billing check that prevents me from making additional requests after that "final query". And they definitely have some check to prevent me from overutilizing my quota when I have an active monthly subscription. So whatever it is that they need to do, when I prepay $x, I'm not ok with them charging me more than that (or I would have prepaid more). It's up to them to figure this out and/or absorb the costs.
> they obviously do have some billing check that prevents me from making additional requests after that "final query"
No they don't actually! They try to get close, but it's not guaranteed (for example, make that "final query" to two different regions concurrently).
Now, they could stand up a separate system with a guaranteed fixed cost, but few people want that and the cost would be higher, so it wouldn't make the money back.
You can do it on your end though: run every request sequentially through a service and track your own usage, stopping when reaching your limit.
They do have a billing check, but that check is looking at "eventually consistent" billing data which could have arbitrary delays or be checked out-of-order compared to how it occurred IRL. This is a strategy that's typically fine when the margin of over-billing is small, maybe 1% or less. I take it from your description that the actual over-billing is more like dozens of dollars, potentially more than single-digit percentages on top of the subscription price. Here's hoping they tighten up metering <> billing.
Then the right thing to do from a consumer standpoint is to factor that overbilling into their upfront pricing, rather than surprising people with bills that they were led not to expect.