> This again feels outdated. I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

Seems so, but that doesn't mean it's a good or correct direction. As of today, none of the existing models can meaningfully handle mid-size tasks on five services with 10k+ LOC each, plus infra (I'm really not interested in greenfield projects done over the weekend that were never touched by actual users). It doesn't make them useless, but it significantly reduces the scope of trustworthy operations models can handle (unless you don't care about outcomes).

The moment your spec, plan, and results of related codebase exploration go beyond 100k tokens (roughly 50% of available context), quality degradation becomes real. Threads/subagents can help, and you can argue that code reviews mitigate some issues, but that's transitioning from reliable automation to gambling without human oversight. Say you want to mitigate the risks of failures (correctly listed by others) - how would you do that if you don't understand your codebase? In my practice, the answer is: you start to learn what your agents created, discover shit they created, and steer them toward better, desired outcomes.

> As of today, none of the existing models can meaningfully handle mid-size tasks on five services with 10k+ LOC each

My FAANG's codebase is a few orders of magnitude larger and agents do an excellent job of handling mid sized tasks completely autonomously.