A developer can blast millions of tokens in minutes. When you have a context size of 250k that’s just 4 queries. But with tool usage and subsequent calls etc it can easily just do many millions in one request
But if you just ask a question or something it’ll take a while to spend a million tokens…
Seems like an opportunity to condense the context into 'documentation' level and only load the full text/code for files that expect to be edited?
Yeah that’s what they try to do with the latest coding agents sub agents which only have the context they need etc. but atm it’s too much work to manage contexts at that level