Hacker News

killerstorm a day ago [ - ]

The way coding agent work is fantastically wasteful. All the megabytes of code are processed over and over and over, sometimes withing just one session.

There are papers describing KV cache precomputation for commonly used documents (e.g. KVLink), but, of course, it's not a priority for model providers: they'd rather sell you more tokens, also they would rather get to AGI/ASI first than optimize usage of existing models...

brookst a day ago [ - ]

Claude code gets >98% KV cache hits. It’s not reprocessing unless you let the cache go cold (5 minutes, which is annoyingly short).

killerstorm a day ago [ - ]

I meant caching on a bigger level. If you're an organization with 100 developers each doing 10 sessions a day, you're paying for 10000x tokens in frequently used document even if you had 100% KV cache hits within one session. Apparently that's too costly even for companies with trillion dollar market cap...

Normally KV cache works only if your context prefix is identical, but there are papers which demonstrate documents can be cached between different contexts.

brookst a day ago [ - ]

Ah, understood, and thanks for the clarification!

beoberha a day ago [ - ]

I believe OP is talking about new sessions or after compaction. He’s getting at the fact that LLMs are stateless and have to rediscover your codebase on every new session.

iainmerrick 14 hours ago [ - ]

To be fair, on the Monday morning after a holiday, that’s exactly what I’m like too.

13 hours ago [ - ]

[deleted]

dgellow 15 hours ago [ - ]

Are you sure that hitting the cache mean you’re not paying for those tokens?

brookst 7 hours ago [ - ]

You pay, at 10% the price (in quota or dollars) for non-cached. See https://platform.claude.com/docs/en/about-claude/pricing

dgellow 7 hours ago [ - ]

Thanks, I should have checked, their pricing table is pretty clear, I was lazy

15 hours ago [ - ]

[deleted]