I commented this yesterday, I’ll repeat it again - what do you guys think organizations that have heavily leaned into AI are shipping nowadays?
Most devs aren’t working on cutting edge, low level, mission critical systems. AI is great for that. Every company I personally know have been fast shipping features that are being used daily by millions of people for the past 7 months.
We have the same thing on my team, and we also understand the limitations of AI generated code. If you’re more or less experienced, you can easily see the “good” and “bad” sides of it. So you kinda plan it out in a way that you can “evolve AI generated software”. I wouldn’t say the same thing in 2025 January, but it’s much different times now. Things are already working.
> If you’re more or less experienced, you can easily see the “good” and “bad” sides of it. So you kinda plan it out in a way that you can “evolve AI generated software”.
If you're truly "managing fleets of agents" there's no way you're able to sift through the good and the bad in the output. If your AI-generated code is evolvable (which is hard to tell right now) then you're not writing it with "fleets of agents". If you are writing it with fleets of agents, I would bet it's not evolvable; you just haven't reached the breaking point yet.
We’re not managing fleets of agents. They’re not productive for our workflows yet. It’s usually a couple of CC CLIs running and going back and forth on specific tasks we closely control.
They're not productive for any workflow is my point because they don't produce sustainable software, yet that's exactly what Armstrong is calling for. They don't work, and people experienced with AI workflows already know that.
If you review the code and tell the agent to revert when it gets things wrong (not functionally but architecturally) you're fine. That's not what I was responding to.
You're just wrong on this though, and I don't know why you aren't realizing it's a skill issue on your part
Nah, it's a skill issue on the part of those who believe in "agent swarms" (in fact, that's how I recognise AI noobs; they think swarms work). Studies (like this [1]) and Anthropic's experiements have told us they don't. We do experiments with software correctness and formal methods experts who actually dive deep into "swarm outputs" and try to put evolutionary pressure on them. Swarms simply cannot (yet) produce viable software. They do, however, produce software that for a while passes tests. What I think is happening is that people who believe swarms work just look at test results. But obviously, every software engineer has known for decades that tests can only tell you if your software works today; they can't tell you that it will work tomorrow. And the people who say that unreviewed agent output will work tomorrow are those who didn't review it closely enough, so they have no idea, either.
[1]: https://arxiv.org/abs/2603.03823
Most of the people making this argument vastly overestimate the quality of engineering and discipline that behind the software powering most corporations. CRUD apps are likely to be the most prominent type of application across industries, and most of them are crud
If the code is really simple, it's cheap to read it. When people don't read it (and when they need to use "fleets of agents"), it's because it's not so simple, and then the people who trust the outcome are those who don't know what it is that they've committed into the codebase. Their logic is no more than: the system hasn't collapsed under the load of 50 (or 500) changes so it probably won't collapse under the load of the next 500 (or 5000). Because that's how engineered systems work, right? If they're fine under light stress, they're fine under heavier stress.
> Because that's how engineered systems work, right? If they're fine under light stress, they're fine under heavier stress.
Isn't this wrong? I thought engineered systems meant something designed with limits.
I was being sarcastic.