Many people wanted to be able to set a spending limit on google cloud account for many years but they were unable to implement anything, always suggesting a workaround by hosting a Cloud Run function which would remove billing from a project via API https://docs.cloud.google.com/billing/docs/how-to/disable-bi...

As someone who is new to the whole google cloud ecosystem, the amount of dark patterns they employ are absolutely shocking. Just off the top of my head:

1. You never know how much a single API request will cost or did cost for the gemini api

2. It takes anywhere between 12-24 hours to tell you how much they will charge you for past aggregate requests

3. No simple way to set limits on payment anywhere in google cloud

4. Either they are charging for the batch api before even returning a result, or their "minimal" thinking mode is burning through 15k tokens for a simple image description task with <200 output tokens. I have no way of knowing which of the two it is. The tokens in the UI are not adding up to the costs, so I can only assume its the first.

5. Incomplete batch requests can't be retrieved if they expire, despite being charged.

6. A truly labyrinthine ui experience that makes modern gacha game developers blush

All I have learned here is to never, ever use a google product.

At scale, distributed API routing shouldn't call accounting transactions, that expands the availability risk surface and adds latency to all valid requests for no reason (other than helping the minority of companies/users who want their product to stop working when it is popular).

Distributed “shared nothing” API handling should make usage available to accounting, and the API handling orchestrator should have a hook that allows accounting to revoke or flag a key.

This gets the accounting transactions and key availability management out of the request handling.

That is a nice excuse, do you work at Google? :) I get the idea of not slowing down requests or risking availability, but don’t tell me a company as big as Google can’t design an asynchronous accounting system robust enough to handle this. We’re not talking about penny-perfect precision - blocking at 110% or even 150% of the set cap would be enough. Right now, though, there’s nothing to prevent a $5k, 20k or even higher bill surprise due to API key leaks, misuse or wrong configuration. To me, this is unacceptable and one of the reason I try to avoid using gcloud (the other one is unbearably slow gogole cloud console "webapp").

That’s exactly what the cloud function does

Yes but each admin has to use their product (cloud function), configure IAM and do that for every project. This is clearly just a work-around.

I haven't used these budget alerts, maybe they are a pain to implement?

https://docs.cloud.google.com/billing/docs/how-to/budgets

They are still not a spending cap of course.

reminds me: Ever used Gemini API on Google Vertex Cloud API? The usage will show up like 24-48 hours later in the dashboard. So when you use Gemini's API on their Cloud me as Workspace admin cannot even track my own usage in near realtime there. Which makes me think that even Google cannot track it in realtime.