The bottleneck isn't agent capability. It's captured institutional knowledge.

I structure system prompts (CLAUDE.md) with verification gates: pre-task checkpoints, approach selection, post-completion rescans. When an agent writes an ADR during a refactor, future agents reference it before touching the same code. The context compounds across sessions.

Commit messages capture what. ADRs capture why. Skill files capture how the team works. That last layer is what most setups miss. The gap isn't level 6 vs 8, it's whether architectural reasoning is machine-readable or trapped in someone's head.