Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?
Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?
From what I've seen is a process of compacting the session once it reaches some limit, which basically means summarizing all the previous work and feeding it as the initial prompt for the next session.
I’m guessing, would love someone who has first hand knowledge to comment. But my guess is it’s some combination of trying many different approaches in parallel (each in a fresh context), then picking the one that works, and splitting up the task into sequential steps, where the output of one step is condensed and is used as an input to the next step (with possibly human steering between steps)
the annoying part is that with tool calls, a lot of those hours is time spent on netowrk round trips.
over long periods of time, checklists are the biggest thing, so the LLM can track whats already done and whats left. after a compact, it can pull the relevant stuff back up and make progress.
having some level or hierarchy is also useful - requirements, high level designs, low level designs, etc