Yes we had the same issue with our coding agent. We found that instead of replacing large tool results in the context it was sometimes better to have two agents, one long lived with smaller tool results produced by another short lived agent that would actually be the one to read and edit large chunks. The downside of this is you always have to manage the balance of which agent gets what context, and you also increase latency and cost a bit (slightly less reuse of prompt cache)

I found that having sub agents just for running and writing unit tests got me over 90% of my context woes

Seems like that could be a job local LLMs do fairly well soon; not a ton of reasoning, just a basic ability to understand functions and write fairly boilerplate code, but it involves a ton of tokens, especially if you have lots of verbose output from a test run. So doing it locally could end up being a huge cost savings as well.

Maybe but you still need to pass some context to the sub agent to describe the system under test.

this sounds like a good approach, i need to try it. I had good results with using context7 in specialized docs agent. I wasn't able how to limit MCP to a subagent, likely its not supported.