> I used billions of tokens last month alone.
I use Claude Code (Opus 4.6 at max effort) all day long, and I genuinely don't understand how this is possible. Is that usage paying off?
This is very likely due to my lack of understanding, but... how?
> I used billions of tokens last month alone.
I use Claude Code (Opus 4.6 at max effort) all day long, and I genuinely don't understand how this is possible. Is that usage paying off?
This is very likely due to my lack of understanding, but... how?
Long codex sessions lead to a lot of cached token hits, esp when you resume them after a few hours.
I personally don't count cached hits as $used... Neither in my harnesses, nor in the LLM-enabled apps I create. A cached token cannot be counted 1:1 as to a non-cached token, that would be silly.
Wait... when some Claude 5x/20x users say they are getting "$2000 of tokens for $100," does the 2k value include cached tokens, counted at the same $/token either way?
We cannot be this dumb as a community, can we? I must be wrong/misunderstanding..
I'm a fairly moderate user, never hit any kind of usage limits, but I used 44 million cache create tokens and 1.5 billion cache read tokens, which ccusage estimates would have cost $990, and calculates the different categories separately.
Vibe coded a simple game (10,000 tokens of source code) with two popular coding agents. (Once each, to compare.)
One spent 200,000 tokens, to produce 10,000.
The other spent 1.9 million.
It could have been a single LLM call (10k tokens). lmao
(I note that the latter was designed by a company whose main source of revenue is token spend...)
What about the other 998 million tokens?
Ya got me there. Maybe he's running OpenClaw?
lots and lots of simple games
Don’t forget context. Basically I have 2 billion input and 1 million output. Every prompt you do, sends back the whole thing again and again. Let’s say you have 500k context used, you send 10 messages is 5 million. 100 messages 50 million. Use 5 threats is 250 million.
But how is it even possible (bad harness?), or wise, to send 500k or 1M tokens per call? Regarding cache, how are you not hitting the 1hr cache? Also, start new chats early and often!
I have been "agentic coding" since Sonnet 3.5 and after this paper came out, it became my bible:
https://github.com/adobe-research/NoLiMa
Last I checked, all models suck as you fill the context window. "Context engineering" is how you do this whole thing.
[dead]