I don't think this is much of a problem with the tools rather than with your approach.
We have successfully put Claude in huge multi-thousands pr long with projects.
But this meant that:
1. Solid architectural and design decisions were made already after much trial and error
2. They were further refined and refactored
3. Countless hours have been spent in documenting, writing proper skills and architectural and best practice documents
Only then Claude started paying off, and even then it's an iterative process where you need to understand why it tries to hack his way out, etc, what to check, what to supervise.
Seriously if you think you can just Claude create some project..
Just fork an existing one that does some larger % of what you need and spend most of the initial time scaffolding it to be ai friendly.
Also, you need to invest in harnessing, giving tools and ways to the LLM to not go off rails.
Strongly typed languages, plenty of compilation and diagnostics tools, access to debuggers or browser mcps, etc.
It's not impossible, but you need to approach it with an experimentation approach, not drinking Kool aid.
See thats the thing. A human is slower but doesnt need all this handholding.
The idea of AI being able to "code" is that it is able to do all this planning and architectural work. It cant. But its sold as though it is. Thats where the bubble is
Because when human comes to the team they already have internal repository with skills. They may need to update them on-the-job or create new ones but they never start fresh. LLM in the other hand starts clean, they are literally blank slates and it’s your job to equip them with the right skills and knowledge. As programmers we must transition from being coders to being trainers/managers if we want to still have premium paid jobs in this brave new world
My counter argument is that thay manual training, while beneficial, wont lead to the scaling factors being thrown around. It wont lead to the single person unicorn that keeps being talked about excitedly.
For that, the model needs to learn all this architecture and structure itself from the huge repositories of human knowledge like the internet
Until then, reality will be below expectations, and the bubble will head towards popping
There are no premium paid jobs for prompting in a brave new world.
AI can plan and do architectural work - just not amazingly well. Treat it as an intern or a new grad at best. Though this capability has been increasing pretty rapidly, so who knows where we'll be in a few years.
I guess I'd rather just complete one tiny part at a time with Claude and understand the output then do all that. It seems like less effort and infrastructure. And a lot more certain in outcome.
Sounds like the amount of work you put into that is not worth the pay-off.
I have the opinion it was well worth it for many reasons.
Not only the agents can complete trivial tasks on their own, leaving us just with reviewing (and often just focusing on the harnessing), but the new setup is very good for onboarding technical and non-technical staff: you can ask any question about both the product or its architecture or decisions.
Everything's documented/harnessed/E2Ed, etc.
Doing all of this work has much improved the codebase in general, proper tests, documentation and design documents do make a difference per se, and it further compounds with LLMs.
Which is my point in any case: if you start a new project just by prompting trivialities it will go off rail and create soups. But if you work on an established and well scaffolded project, the chances of going off rails and creating soups is very small.
And thus my conclusion: just fork existing projects that already do many of the things you need (plenty of them from compilers to native applications to anything really), focus on the scaffolding and understanding the project, then start iterating by adding features, examples and keeping the hygiene high.
[flagged]