Windsurf made a similar change in March: https://docs.windsurf.com/windsurf/accounts/quota

> In March 2026, Windsurf replaced the credit-based system with a quota-based usage system. Instead of buying and spending credits, your plan now includes a daily and weekly usage allowance that refreshes automatically.

With hindsight, per-request pricing makes no sense at all if an agent can burn a widely varying amount of tokens satisfying that request. These pricing plans were designed before coding agents changed the dynamics of token usage.

I wouldn't call it hindsight - I don't think anyone, at any stage, thought running a 10 minute+ sonnet session for 1 premium credit was ever profitable. We all knew it was a loss leader to get people using it.

It would have been profitable if that premium credit cost more than a negotiated discounted rate with Anthropic. We have no way of knowing if there were negotiated rates though!

There is no way to make that cost model profitable consistently. If 1 prompt can mean 100's/1000's of requests over hours, and you only pay for that 1 premium prompt, that can never be profitable.

They can engineer the harness to limit the amount it does. When pressing enter, it's be nice to have a "budget" per prompt, much like the model multiplier. When the harness used up the budget, it cleans up and cuts off the work.

But that would entail actual work and effort...and care for user's time and money.

Guys, you're discussing a house of cards to begin with: No matter how you're paying for the $CURRENTSOTA you're not garunteed that next month what you pay for will be the same.

So, lets do some honest evaluations:

1. The model itself is a non-deterministic engine of work with an unknown value; it's real value is just magic.

2. The business model itself is non-deterministic engine of profit with a known value; whatever the VCs have put into it, _must_ be piulled out. If Ed Zitron's numbers are correct, circa 2030, it's several trillion dollars.

So do some matrix multiplication of non-determinism vs determinism, and realize that the value proposition for _you_ is only going to decrease because #1 can never outpace #2, ensuring enshittification captures a smaller and smaller whale.

We know this. This has been the last 2 decades of money extraction from software. It was ok when it was some 12 year old's parents CC. But now it's you, or your business, that's going to either ben squeeze for value or squeeze out of the market.

And everyones squabbling about the color of the cost. ok

The problem with assuming that tokens can only get more expensive is that the Chinese open weight LLM firms have dropped models which have a known, fixed price that can never get more expensive (since we can run them on hardware we own).

Well, I guess we're not discussing the same thing. The cost of cloud tokens are going to go up. They won't ever be cheaper. They're generating far more tokens than my AMD 395+ w/128GB at a much cheaper rate.

I agree though, it can't get cheaper than the cost of hardware it's just without sufficient documentation of the actual costs to run the cloud models, we can't really know what the "true" cost of each token is. I assume there's an economist out there somewhere that could figure it out though. Certainly, the cost should approach at a minimum a open weights model running on a local machine.

I've succesffully got Qwen3-coder-next to loop and generate sufficiently competent code and from what I can tell, the difference between this and the cloth is how quickly the gen happens and perhas how interactive it has to be.

per-request was broken, yeah. but $10 of monthly credits is basically just a prepaid wallet with a reset timer.