In a recent interview with The Pragmatic Engineer, Steve Yegge said he feels "sorry for people" who merely "use Cursor, ask it questions sometimes, review its code really carefully, and then check it in."
Instead, he recommends engineers integrate LLMs into their workflow more and more, until they are managing multiple agents at one time. The final level in his AI Coding chart reads: "Level 8: you build your own orchestrator to coordinate more agents."
At my work, this wouldn't fly-- we're still doing things the sorry way. Are you using orchestrators to manage multiple agents at work? Particularly interested in non-greenfield applications and how that's changed your SDLC.
> Steve Yegge said he feels "sorry for people" who merely "use Cursor, ask it questions sometimes, review its code really carefully, and then check it in."
Steve Yegge is building a multi-agent orchestration system. This is him trying to FOMO listeners into using his project.
From what I've observed, the people trying to use herds of agents to work on different things at the same time are just using tokens as fast as possible because they think more tokens means more progress. As you scale up the sub-agents you spend so much time managing the herd and trying to backtrack when things go wrong that you would have been better off handling it serially with yourself in the loop.
If you don't have someone else paying the bill for unlimited token usage it's going to be a very expensive experiment.
I think people who run 15 agents to write a piece of software could probably use 1 or 2 and a better multi-page prompt and have the same results for a fraction of the cost.
Especially with the latest models which pack quite a long and meaningful horizon into a single session, if you prompt diligently for what exactly you want it to do. Modern agentic coding spins up its own sub-agents when it makes sense to parallelize.
It's just not as sexy as typing a sentence and letting your AI bill go BRR (and then talking about it).
I'd like to see some actual results with a meaningful benchmark of software output that shows that agent orchestrators accomplish any meaningful improvement in the state of the art of software engineering, other than spending more tokens.
Maybe it's time to dredge up the Mythical Man-Month?
Having gone through his interview just now, his advice and experience seems centered around Vibe coding new applications and not really reflective of the reality of the industry.
> But I feel sorry for people who are good engineers – or who used to be – and they use Cursor, ask it questions sometimes, review its code really carefully, and then check it in. And I’m like: ‘dude, you’re going to get fired [because you are not keeping up with modern tools] and you’re one of the best engineers I know!’”
I would certainly take a careful person over the likes of yegge who seems to be neither pragmatic, nor an engineer.
Yegge became famous from his blog recounting his hiring as a software engineer at Google in the early 2010s. He has been an engineer for a long time.
However, the implication that someone failing to use an experimental technology is falling behind is hyperbole.
[delayed]
I don't know what kind of work he's doing that doesn't require actually reading the code to ensure it's appropriately maintainable, but more power to him. I actually like knowing what the hell my code is doing and that it conforms to my standards before committing it. I'll accept his condolences.
Same, seems completely irresponsible.
We don't have time for safety, or security, or accuracy, or even understandability anymore. We need to move fast! /s
No point. Claude Code with skills and subagents is plenty. If they would stop breaking it constantly it would be fine.
The bottleneck has not been how quickly you can generate reasonable code for a good while now. It’s how quickly you can integrate and deploy it and how much operational toil it causes. On any team > 1, that’s going to rely on getting a lot of people to work together effectively too, and it turns out that’s a completely different problem with different solutions.
What if you could remove that toil.
When orchestrating you need to have a damn good plan / requirements. And then I'm typing or thinking a lot beforehand. And at the end it's never 100% what you want.
That is why I'm going back to per function/small scope ai questions.
Not the best way to do it, but I use xfce, multiple workspaces, each with there own version of AWS Kiro, and each kiro has its own project I am working on. This allows me to "switch context" easier between each project to check how the agents are getting on. Kiro also notifies me when an agent wants somthing. Usually I keep it to about 4 projects at a time, just to keep the context switching down.
I did when just starting on a new project, it was working well when I had many new components to implement. But as the project matured and stabilized every new feature is cross-cutting and it's impossible to parallelize the work without running into conflicts (design conflicts, where two agents would add similar overlapping mechanisms, and also the usual code conflicts, touching the same files). Also, with project maturity I'm much more concerned about keeping it stable and correct, which is hard to do with parallel agents running amok.
I find if you just ask the agents to resolve the conflicts they do a pretty great job. It's even better if you can feed them all the context while resolving the conflict.
The harder problem is conflicting design choices, or duplicating similar infra. It means I need to be much more involved in steering individual agents and planning up front (waterfall style), which limits the parallelism further
I think Yegge needs to keep up with the tech a bit more. Cursor has gotten quite powerful - it's plan mode now seems about on par with Claude Code, producing Mermaid charts and detailed multi-phase plans that pretty much just work. I also noticed their debug mode will now come up with several thesises (thesi?), create some sort of debugging harness and logging system, test each thesis, tear down the debugging logic and present a solution. I have no idea when that happened, but it helped solve a tricky frontend race condition for me a day or two ago.
I still like Claude, but man does it suck down tokens.
Sometimes I tell the AI to change something, sometimes I just do it myself. Sometimes I start to do it and then the magic tab-complete guesses well enough that I can just tab through the rest of it.
Sometimes the magic tab-complete insists on something silly and repeatedly gets in the way.
Sometimes I tell the AI to do something, and then have to back out the whole thing and do it right myself. Sometimes it's only a little wrong, and I can accept the result and then tweak it a bit. Sometimes it's a little wrong in a way that's easy to tell it to fix.
I am unfortunately in level 8. God help me. But honestly building an agent orchestrator is a really fun problem. It's like building an IDE and then using that IDE to build itself. Or building a programming language and then coding in that language! But with an entirely new host of different and interesting problems.
I would love to experience this, but I'm only at the level were I occasionally open ChatGPT or Claude, asked it a question, and then get frustrated because it can't even give me a straight answer, or makes incorrect assumptions.
I can't even imagine having multiple agents write code that somehow works.
No, I don't even use agents to generate code most of the time. I mainly use the inline assistant to modify or fill out blocks of code, and agents sometimes for refactors, asking questions, search, debugging, generating documentation etc.
I feel bad for Yegge.
I have been helping people get onboarded with Claude Code and the orchestrator I wrote called Metaswarm [1) and the response has been way beyond my expectations.
But don't take my word for it, try it out for yourself, it is MIT licensed, and you can create new projects with it or add it to an existing project.
[1] https://github.com/dsifry/metaswarm
I think people should figure out what works for them rather than letting people on the internet gate-keep what is good. Everything is about personal choices and refining your own taste. I would not be happy being unable to understand everything deeply so having a million agents all doing stuff would just cause me a load of stress even if I could churn stuff out more quickly.
> At my work, this wouldn't fly
How does one even review the code from multiple agents. The quality imo is still to low to just let run on its own.
The stumbling block we have is spinning up separate environments for every agent so they have isolation for their branches. I think this is solveable, but we aren't trying to solve it ourselves. In practice it means we aren't doing a lot of agent supervision.
That sounds like an excellent match for containers.
I stopped manually writing code 6-9mo ago, and am generating high-quality code on the dimensions we care about like GPU perf benchmarks, internal & industry conformance standards test suites, evals benchmarks, lint/type checkers, etc. It's not perfect code - there are clear AI slop tell tales that review cycles still let linger - but it's doing more ambitious things than we'd do on most dimensions like capability, quality, and volume. We're solving years-old GPU bugs that we had given up on as mere mortals.
And yes, we build our own orchestrator tech, both as our product (not vibes coding but vibes investigating), and more relevant here, our internal tooling. For example, otel & evals increasingly drive our AI coding loops rather than people. Codex and claude code are great agentic coding harnesses, so our 'custom orchestration' work is more about more intelligently using them in richer pipelines, like the above eval-driven loop. They've been pretty steadily adding features like parallel subagents that work in teams, and hookable enough to do most tricks, that I don't feel the need to use others. We're busy enough adapting on our own!
i tried but it didn't worked for me. Now i use agents as editors for fully formed solution so slightly better editor than typing.
People lie. Let's see a video of them doing this, or logs of the sessions, and the generated code, so we can judge for ourselves.
There's important stuff to review, 10-20% (eg overall architecture, use of existing utilities/patterns), and there's the specifics of the client code.
My reviews pick out the first and gloss over the latter. They take a few minutes. So I run multiple distinct tasks across agents in antigravity, so there's less chance of conflict. This is on 500k+ line codebase. I'm amazed by the complexity of changes it can handle.
But I agree with his take. Old fashioned programming is dead. Now I do the work of a team of 3 or 4 people each day: AI speed but also no meetings, no discussions, no friction.
"Claude writes, Codex reviews" has shown huge promise as a pattern for me, so I wrote a Dockerfile and some instructions on how to make that happen for agents, and ended up with https://github.com/pjlsergeant/moarcode
I am spending most of my day in this harness. It has rough edges for sure, but it means I trust the code coming out much more than I did just Claude.
I don't think you need two separate models for this - I get similarly good results re-prompting with Claude. Well, not re-prompting, I just have a skill that wipes the context then gets Claude to review the current PR and make improvements before I review it.
I tried to opposite because claude was not coding as well as codex some additional modules for my codebase and codex could. Then I tried to get claude to read and critique and it got so many fundamentals wrong I was wondering if I am using the wrong model.
Vscode agent mode is pretty slick
We're not there yet, but it's going to happen. Given the nature of the application I'm working on, I wouldn't be surprised if the entire headcount of the engineering department were reduced to around five or so in a year or two.