I have my own llm wrapping harness, which does this and has a few more tricks. For example, it doesn’t have a lot of mcp but it does have search_mcp and load_mcp tools (and search_skills) so the llm can find what it needs when it needs it without bloating the normal baseline context. The LLMs have proved really good at using them. There is also a waypoint tool they can use to record their thinking in the context without it being the final output. Am thinking about a search_expert to find colleagues it can bring into conversations too. And a lot of other stuff.
Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.
great approach. I did that with my opencode based setup as well, it's neat and fun to tune skills and mcp loaders and stuff. Then i got fed up with opencode's design limitations. And then, my own harness work is on hold in favor of a harness-puppeteer paradigm, but that one has also been on hold! I'm mostly currently pulling on the thread of making it easier just to review the voluminous conversation turns!
Interesting approach. Thanks for sharing.