I think agents have a curve where they're kinda bad at bootstrapping a project, very good if used in a small-to-medium-sized existing project and then it goes downhill from there as size increases, slowly.

Something about a brand-new project often makes LLMs drop to "example grade" code, the kind you'd never put in production. (An example: claude implemented per-task file logging in my prototype project by pushing to an array of log lines, serializing the entire thing to JSON and rewriting the entire file, for every logged event)