> And yes, if you want the absolute best, Opus 4.8 exists. It also costs more per 20 minutes of heavy use than I paid for this entire GPU and adapter setup combined. But the gap is shockingly small.

I don't think this is a fair characterization of the situation. I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month. The fact that we figured out how to burn double this in 20 minutes is impressive, but I don't think it reflects the reality that many are experiencing right now. There are some exceptionally gluttonous approaches to harnessing LLMs that I think are serving as convenient straw men in these discussions.

Paying for the API will almost always be more economical than self-hosting equivalent infrastructure. I am not against self-hosting, but the article suggests a primarily economic motivation for this effort. If you are consuming fewer than 10^9 tokens per month, I really don't think it's worth your time to try and compete with the hyperscalars. Most of the money is to be found in the integration of this technology with existing businesses.

I use hosted providers myself, but I can churn through $100 worth of tokens in half a day even with cheap models like Deepseek easily. If someone's use is as light as yours, then sure - grab a subscription and you'll save far more. For higher use it will come down to how cheap your electricity is whether it is worth offloading at least some of it (for me it's not, FWIW)

Same, very surprised when people on HN are shocked by high token burn - it's really not hard if you've figured out how to use LLMs!

Could you share a bit about what you’re working on or what type of projects require that much usage? Is it hobby, production, revenue generating?

A mix. I have hobby projects that churn through that much when I don't need the tokens for others things. I also have projects for clients that easily consumes those levels. As well as a stealth-ish potential startup. Currently I'm at 4 different subscriptions + more than I'd like in spend via OpenRouter...

What multiplies it very quickly is when you start feeding them with test suites and "Ralph loops" that run until the test suites pass, or complex chains with lots of sub-agents being triggered.

If you're sitting there watching everything, it will be hard to burn all that much even if you're running multiple things in paralle.

I'm skeptical of letting agents run free like this. Even Opus makes decisions I don't always agree with. And I quickly lose my mental model of how the code is evolving.

I get more enjoyment and better results when the coding process is me and the agent working through a plan, at each step sparring over what to do next and how. Then I also catch the bad decisions before they manifest in the code.

Claude is something like $35 per million tokens. If I was using API pricing I could trivially spend $100 in a single hour long coding session, with /fast turned on in about 10 minutes. Not sure how you guys are using it.

Opus is normally $5 per mtok, no idea why anyone would use /fast if they were at all concerned about price. ($5 is still pricy though tbh)

Opus is $5 per mtok of input tokens, but $25 for output.

coding is the easy part of using claude

> I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month.

According to ccusage (https://github.com/ryoppippi/ccusage) if I didn’t have the 100 USD Max subscription, I’d have to pay Anthropic around 4173 USD for the month of May.

  Input     │ Output     │ Cache Create │ Cache Read    │ Total Tokens  │ Cost (USD)
  1,948,016 │ 19,435,081 │ 103,626,350  │ 6,244,194,278 │ 6,369,203,725 │ $4173.09
Edit: pulled the latest numbers, not using Fast mode at all, but still Opus for most tasks.

Nothing too egregious with my usage patterns, typically Claude Code just churning tasks in 1-2 projects at a time, sometimes while I’m asleep - and I hit around 60-80% of the weekly caps most of the time.

How do you orchestrate this? I’m on max and would love to be hitting my caps when I’m not actively working a project

In my case: the Claude Code desktop app makes having a bunch of parallel sessions easy, at least compared to when I had just a bunch of terminal windows open https://claude.com/download can also couple that with Remote Control https://code.claude.com/docs/en/remote-control

Previously I still had the issue of it occasionally stopping let's say after Stage 2/7 is done in some plan and asking me to continue, though I was asleep. The options there were either looping it (like RALPH loop), or more recently they also released their dynamic workflows alongside Opus 4.8: https://claude.com/blog/introducing-dynamic-workflows-in-cla... and now I just use that.

So essentially you come up with a plan and just ask it to create a dynamic workflow for you, and it's gonna go through everything step by step, sometimes parallelizing (as it normally would with sub-agents) as necessary. Can also use worktrees if needed.

Here's an example of the UI: https://imgur.com/a/4Gr3Z2T (note that I'm using DeepSeek there for a small local utility, with a tool I'm using for managing various providers with Claude Code, but works the same with subscription)

I looked at the stuff Cline was doing with their Kanban boards too, but in the end realized that I don't really need those (for now) and that Claude Code is enough.