All of this speedrun hits a wall at the context window. As long as the project fits into 200k tokens, you’re flying. The moment it outgrows that, productivity doesn’t drop by 20% - it drops to zero. You start spending hours explaining to the agent what you changed in another file that it has already forgotten. Large organizations win in the long run precisely because they rely on processes that don’t depend on the memory of a single brain - even an electronic one

This reads as if written by someone who has never used these tools before. No one ever tries to "fit" the entire project into a single context window. Successfully using coding LLMs involves context management (some of which is now done by the models themselves) so that you can isolate the issues you're currently working on, and get enough context to work effectively. Working on enormous codebases over the past two months, I have never had to remind the model what it changed in another file, because 1) it has access to git and can easily see what has changed, and 2) I work with the model to break down projects into pieces that can be worked on sequentially. And keep in mind, this the worst this technology will ever be - it will only get larger context windows and better memory from here.

What are the SOTA methods for context management assuming the agent runs with its tool calls without any break? Do you flush GPU tokens/adjust KV caches when you need to compress context by summarizing/logging some part?

Everyone I know who is using AI effectively has solved for the context window problem in their process. You use design, planning and task documents to bootstrap fresh contexts as the agents move through the task. Using these approaches you can have the agents address bigger and bigger problems. And you can get them to split the work into easily reviewable chunks, which is where the bottleneck is these days.

Plus the highest end models now don’t go so brain dead at compaction. I suspect that passing context well through compaction will be part of the next wave of model improvements.