Hacker News

> I used billions of tokens last month alone.

I use Claude Code (Opus 4.6 at max effort) all day long, and I genuinely don't understand how this is possible. Is that usage paying off?

This is very likely due to my lack of understanding, but... how?

letitgo12345 a day ago [ - ]

Long codex sessions lead to a lot of cached token hits, esp when you resume them after a few hours.

consumer451 20 hours ago [ - ]

I personally don't count cached hits as $used... Neither in my harnesses, nor in the LLM-enabled apps I create. A cached token cannot be counted 1:1 as to a non-cached token, that would be silly.

Wait... when some Claude 5x/20x users say they are getting "$2000 of tokens for $100," does the 2k value include cached tokens, counted at the same $/token either way?

We cannot be this dumb as a community, can we? I must be wrong/misunderstanding..

SatvikBeri 18 hours ago [ - ]

I'm a fairly moderate user, never hit any kind of usage limits, but I used 44 million cache create tokens and 1.5 billion cache read tokens, which ccusage estimates would have cost $990, and calculates the different categories separately.

andai 21 hours ago [ - ]

Vibe coded a simple game (10,000 tokens of source code) with two popular coding agents. (Once each, to compare.)

One spent 200,000 tokens, to produce 10,000.

The other spent 1.9 million.

It could have been a single LLM call (10k tokens). lmao

(I note that the latter was designed by a company whose main source of revenue is token spend...)

crab_galaxy 21 hours ago [ - ]

What about the other 998 million tokens?

andai 5 hours ago [ - ]

Ya got me there. Maybe he's running OpenClaw?

stronglikedan 20 hours ago [ - ]

lots and lots of simple games

skeptic_ai 21 hours ago [ - ]

Don’t forget context. Basically I have 2 billion input and 1 million output. Every prompt you do, sends back the whole thing again and again. Let’s say you have 500k context used, you send 10 messages is 5 million. 100 messages 50 million. Use 5 threats is 250 million.

consumer451 21 hours ago [ - ]

But how is it even possible (bad harness?), or wise, to send 500k or 1M tokens per call? Regarding cache, how are you not hitting the 1hr cache? Also, start new chats early and often!

I have been "agentic coding" since Sonnet 3.5 and after this paper came out, it became my bible:

https://github.com/adobe-research/NoLiMa

Last I checked, all models suck as you fill the context window. "Context engineering" is how you do this whole thing.

azuanrb 12 hours ago [ - ]

[dead]