This matches what I've found running persistent agents. The compounding context is the whole game.
The pattern that works: treat your agent's workspace like infrastructure, not a scratch pad. ADRs, skill files, structured memory of past decisions - all of it becomes the equivalent of institutional knowledge that a senior engineer carries in their head. Except it survives session restarts.
The article's TDD framing gets at something important too. The acceptance criteria aren't just verification - they're context. When you write "after 5 failed attempts, login blocked for 60 seconds" before the agent touches code, you've constrained the solution space dramatically. The agent isn't guessing what you want anymore.
Where I think the article undersells the problem: spec misunderstandings compound too. If your architectural context has a wrong assumption baked in, every agent session inherits that assumption. You need periodic human review of the context itself, not just the outputs. The ADRs need auditing the same way code does.
https://github.com/safety-quotient-lab/psychology-agent <- I've been exploring ways to track decisions, making some interesting findings, at the homelab scale, at least.
The cognitive architecture, so to speak, for the LLM can make a huge difference - triggers and skills go a long way when combined with shell scripts that dual-write.