I would argue that it's going to be the opposite. At re:Invent, one of the popular sessions was in creating a trio of SRE agents, one of which did nothing but read logs and report errors, one of which did analysis of the errors and triaged and proposed fixes, and one to do the work and submit PRs to your repo.
Then, as part of the session, you would artificially introduce a bug into the system, then run into the bug in your browser. You'd see the failure happen in browser, and looking at Cloudwatch logs you'd see the error get logged.
Two minutes later, the SRE agents had the bug fixed and ready to be merged.
"understand how these systems actually function" isn't incompatible with "I didn't write most of this code". Unless you are only ever a single engineer, your career is filled with "I need to debug code I didn't write". What we have seen over the past few months is a gigantic leap in output quality, such that re-prompting happens less and less. Additionally, "after you've written this, document the logic within this markdown file" is extremely useful for your own reference and for future LLM sessions.
AWS is making a huge, huge bet on this being the future of software engineering, and even though they have their weird AWS-ish lock-in for some of the LLM-adjacent practices, it is an extremely compelling vision, and as these nondeterministic tools get more deterministic supporting functions to help their work, the quality is going to approach and probably exceed human coding quality.
I agree with both you and the GP. Yes, coding is being totally revolutionized by AI, and we don't really know where the ceiling will be (though I'm skeptical we'll reach true AGI any time soon), but I believe there still an essential element of understanding how computer systems work that is required to leverage AI in a sustainable way.
There is some combination of curiosity of inner workings and precision of thought that has always been essential in becoming a successful engineer. In my very first CS 101 class I remember the professor alluding to two hurdles (pointers and recursion) which a significant portion of the class would not be able to surpass and they would change majors. Throughout the subsequent decades I saw this pattern again and again with junior engineers, bootcamp grads, etc. There are some people no matter how hard they work, they can't grok abstraction and unlock a general understanding of computing possibility.
With AI you don't need to know syntax anymore, but to write the write prompts to maintain a system and (crucially) the integrity of its data over time, you still need this understanding. I'm not sure how the AI-native generation of software engineers will develop this without writing code hands-on, but I am confident they will figure it out because I believe it to be an innate, often pedantic, thirst for understanding that some people have and some don't. This is the essential quality to succeed in software both in the past and in the future. Although vibe coding lowers the barrier to entry dramatically, there is a brick wall looming just beyond the toy app/prototype phase for anyone without a technical mindset.
I can see why people are skeptical devs can be 10x as productive.
But something I'd bet money on is that devs are 10x more productive at using these tools.
I get its necessary for investment, but I'd be a lot happier with these tools if we didn't keep making these wild claims, because I'm certainly not seeing 10x the output. When I ask for examples, 90% its claude code (not a beacon of good software anyway but if nearly everyone is pointing to one example it tells you thats the best you can probably expect) and 10% weekend projects, which are cool, but not 10x cool. Opus 4.5 was released in Dec 2025, by this point people should be churning out year long projects in a month, and I certainly haven't seen that.
I've used them a few times, and they're pretty cool. If it was just sold as that (again, couldn't be, see: trillion dollar investments) I wouldn't have nearly as much of a leg to stand on
Have you seen moltbook? One dude coded reddit clone for bots in less the a week. How is it not at least 10x of what was achievable in pre-ai world?
Granted he left the db open to public, but some meat powered startups did exactly the same few years ago.
I mean as has already been pointed out the fact that its a clone is a big reason why, but then I also think I could probably churn out a simple clone of reddit in less than a week. We've been through this before with twitter, the value isnt the tech (which is relatively straightforward), its the userbase. Of course Reddit has some more advanced features which would be more difficult, but I think the public db probably tells you that wasn't much of a concern to Moltbook either, so yeh, I reckon I could do that.
Because its a clone.
Id wager my life savings that devs aren’t even 1.5x more productive using these tools.
Even if I am only slightly more productive, it feels like I am flying. The mental toll is severely reduced and the feel good factor of getting stuff done easily (rather than as a slog) is immense. That's got to be worth something in terms of the mental wellbeing of our profession.
FWIW I generally treat the AI as a pair programmer. It does most of the typing and I ask it why it did this? Is that the most idiomatic way of doing it? That seems hacky. Did you consider edge case foo? Oh wait let's call it a BarWidget not a FooWidget - rename everything in all other code/tests/make/doc files Etc etc.
I save a lot of time typing boilerplate, and I find myself more willing (and a lot less grumpy!!!) to bin a load of things I've been working on but then realise is the wrong approach or if the requirements change (in the past I might try to modify something I'd been working on for a week rather than start from scratch again, with AI there is zero activation energy to start again the right way). Thats super valuable in my mind.
It probably depends on the developer, and how much slop/bugs is willing to be tolerated.
Dead wrong.
Because the world is still filled with problems that would once have been on the wrong side of the is it worth your time matrix ( https://xkcd.com/1205/ )
There are all sorts of things that I, personally, should have automated long ago that I threw at claud to do for me. What was the cost to me? Prompt and a code review.
Meanwhile, on larger tasks an LLM deeply integrated into my IDE has been a boon. Having an internal debate on how to solve a problem, try both, write a test, prove out what is going to be better. Pair program, function by function with your LLM, treat it like a jr dev who can type faster than you if you give it clear instructions. I think you will be shocked at how quickly you can massively scale up your productivity.
Yup, I've already run like 6 of my personal projects including 1 for my wife that I had lost interest in. For a few dollars, these are now actually running and being used by my family. These tools are a great enabler for people like me. lol
I used to complain when my friends and family gave me ideas for something they wanted or needed help with because I was just too tired to do it after a day's work. Now I can sit next to them and we can pair program an entire idea in an evening.
The matrix framing is a very nice and way to put it. This morning I asked my assistant to code up a nice debugger for a particular flow in my application. It’s much better than I would have had time/patience to build myself for a nice-to-have.
I sort of have a different view of that time matrix. If AI is only able to help me do tasks that are of low value, where I previously wouldn’t have bothered—- is it really saving me anything? Before where I’d simply ignore auxiliary tasks, and focus on what matters, I’m now constantly detoured with them thinking “it’ll only take ten minutes.”
I also primarily write Elixir, and I have found most Agents are only capable of writing small pieces well. More complicated asks tend to produce unnecessarily complicated solutions, ones that may “work,” on the surface, but don’t hold up in practice. I’ve seen a large increase in small bugs with more AI coding assistance.
When I write code, I want to write it and forget about it. As a result, I’ve written a LOT of code which has gone on to work for years without touching it. The amount of time I spent writing it is inconsequential in every sense. I personally have not found AI capable of producing code like that (yet, as all things, that could change).
Does AI help with some stuff? Sure. I always forget common patterns in Terraform because I don’t often have to use it. Writing some initial resources and asking it to “make it normal,” is helpful. That does save time. Asking it to write a gen server correctly, is an act of self-harm because it fundamentally does not understand concurrency in Erlang/BEAM/OTP. It very much looks like it does, but it 100% does not.
tldr; I think the ease of use of AI can cause us to over produce and as a result we miss the forest for the trees.
> are only capable of writing small pieces well.
It excels at this, and if you have it deeply integrated into your workflow and IDE/dev env the loop should feel more like pair programing, like tennis, than it should feel like its doing everything for you.
> I also primarily write Elixir,
I would also venture that it has less to do with the language (it is a factor) and more to do with what you are working on. Domain will matter in terms of sample size (code) and understanding (language to support). There could be 1000s of examples in its training data of what you want, but if no one wrote a commment that accurately describes what that does...
> I think the ease of use of AI can cause us to over produce and as a result we miss the forest for the trees.
This is spot on. I stopped thinking of it as "AI" and started thinking of it as "power tools". Useful, and like a power tool you should be cautious because there is danger there... It isnt smart, it's not doing anything that isnt in its training data, but there is a lot there, everything, and it can do some basic synthesis.
Like others are saying, AI will accelerate the gap between competent devs and mediocre devs. It is a multiplier. AI cannot replace fundamentals, at least not a good helmsman with a good rational, detail-oriented mind. Having fundamentals (skill & knowledge) + using AI will be the cheat code in the next 10 years.
The only historical analogue of this is perhaps differentiating a good project manager from an excellent one. No matter how advanced, technology will not substitute for competence.
I view the current tools as more of a multiplier of base skill.
A 1x engineer may become a 5x engineer, but a -1x will also produce 5x more bad code.
Several experiments have shown quality of output at every skill level drops.
In many cases the quantity of output is good enough to compensate, but quality is extremely difficult to improve at scale. Beefing up QA to handle significantly more code of noticeably lower quality only goes so far.
Speaking as someone who has been both a SRE/DevOps from all levels from IC to Global Head of a team:
- I 100% believe this is happening and is probably going to be the case in the next 6 months. I've seen Claude and Grok debug issues when they only had half of the relevant evidence (e.g. Given A and B, it's most likely X). It can even debug complex issues between systems using logs, metrics etc. In other words, everything a human would do (and sometimes better).
- The situation described is actually not that different from being a SRE manager. e.g. as you get more senior, you aren't doing the investigations yourself. It's usually your direct reports that are actually looking at the logs etc. You may occasionally get involved for more complex issues or big outages but the direct reports are doing a lot of the heavy lifting.
- All of the above being said, I can imagine errors so weird/complex etc that the LLMs either can't figure it out, don't have the MCP or skill to resolve it or there is some giant technology issue that breaks a lot of stuff. Facebook engineers using angle grinders to get into the data center due to DNS issues comes to mind for the last one.
Which probably means we are all going to start to be more like airline pilots:
- highly trained in debugging AND managing fleets of LLMs
- managing autonomous systems
- around "just in case" the LLMs fall over
P.S. I've been very well paid over the years and being a SRE is how I feed my family. I do worry, like many, about how all of this is going to affect that. Sobering stuff.
-
Now run that loop 1000 times.
What does the code /system look like.
It is going to be more like evolution (fit to environment) than engineering (fit to purpose).
It will be fascinating to watch nonetheless.
It'll probably look like the code version of this, an image run through a LLM 101 times with the directive to create a replica of the input image: https://www.reddit.com/r/ChatGPT/comments/1kbj71z/i_tried_th... Despite being provided with explicit instructions, well...
People are still wrongly attributing a mind to something that is essentially mindless.
I mean, if you tell a chain of 100 humans to redraw a a picture i would expect it to go similar, just much faster
"evolution (fit to environment) than engineering (fit to purpose)."
Oh, I absolutely love this lens.
Sure, if all you ask it to do is fix bugs. You can also ask it to work on code health things like better organization, better testing, finding interesting invariants and enforcing them, and so on.
It's up to you what you want to prioritize.
I have some healthy skepticism on this claim though. Maybe, but there will be a point of diminishing returns where these refactors introduce more problems than they solve and just cause more AI spending.
Code is always a liability. More code just means more problems. There has never been a code generating tool that was any good. If you can have a tool generate the code, it means you can write something on a higher level of abstraction that would not need that code to begin with.
AI can be used to write this better quality / higher level code. That's the interesting part to me. Not churning out massive amounts of code, that's a mistake.
"What can we do to reduce the size of the codebase" seems like an interesting prompt to try.
There's an interesting phenomenon I noticed with the "skeptics". They're constantly using what-ifs (aka goalpost moving), but the interesting thing is that those exact same what-ifs were "solved" earlier, but dismissed as "not good enough".
This exact thing about optimisation has been shown years ago. "Here's a function, make it faster". With "glue" to test the function, and it kinda worked even with GPT4 era models. Then came alphaevolve where google found improvements in real algorithms (both theoretical i.e. packing squares and practical i.e. ML kernels). And yet these were dismissed as "yeah, but that's just optimisation, that's easyyyy. Wake me up when they write software from 0 to 1 and it works".
Well, here we are. We now have a compiler that can compile and boot linux! And people are complaining that the code is unmaintainable and that it's slow / unoptimised. We've gone full circle, but forgot that optimisation was easyyyy. Now it's something to complain about. Oh well...
I use LLM’s daily and agents occasionally. They are useful, but there is no need to move any goal posts; they easily do shit work still in 2026.
All my coworkers use agents extensively in the backend and the amount of shit code, bad tests and bugs has skyrocketed.
Couple that with a domain (medicine) where our customer in some cases needs to validate the application’s behaviour extensively and it’s a fucking disaster —- very expensive iteration instead of doing it well upfront.
I think we have some pretty good power tools now, but using them appropriately is a skill issue, and some people are learning to use them in a very expensive way.
Microsoft will be an excellent real-world experiment on whether this is any good. We so easily forget that giant platform owners are staking everything on all this working exactly as advertised.
Some of my calculations going forward will continue to be along the lines of 'what do I do in the event that EVERYTHING breaks and cannot be fixed'. Some of my day job includes retro coding for retro platforms, though it's cumbersome. That means I'll be able to supply useful things for survivors of an informational apocalypse, though I'm hoping we don't all experience one.
I agree but want to interject that "code organization " won't matter for long.
Programming Languages were made for people. I'm old enough to have programmed in z80 and 8086 assembler. I've been through plenty of prog.langs. through my career.
But once building systems become prompting an agent to build a flow that reads these two types of excels, cleans them,filters them, merges them and outputs the result for the web (oh and make it interactive and highly available ) .
Code won't matter. You'll have other agents that check that the system is built right, you'll have agents that test the functionality and agents that ask and propose functionality and ideas.
Most likely the Programming language will become similar to the old Telegraph texts (telegrams) which were heavily optimized for word/token count. They will be optimized to be LLM grokable instead of human grokable.
Its going to be amazing.
What you’re describing is that we’d turn deterministic engineering into the same march of 9s that FSD and robotics are going through now - but for every single workflow. If you can’t check the code for correctness, and debug it, then your test system must be absolutely perfect and cover every possible outcome. Since that’s not possible for nontrivial software, you’re starting a march of 9s towards 100% correctness of each solution.
That accounting software will need 100M unit tests before you can be certain it covers all your legal requirements. (Hyperbole but you get the idea) Who’s going to verify all those tests? Do you need a reference implementation to compare against?
Making LLM work opaque to inspection is kind of like pasting the outcome of a mathematical proof without any context (which is almost worthless AFAIK).
Will you trust code like this to run airplanes?
Remember, even Waymo has a ton of non-AI code it is built upon. We will still have PyTorch, embedded systems software, etc.
There are certainly people working on making this happen. As a hobbyist, maybe I'll still have some retro fun polishing the source code for certain projects I care about? (Using our new power tools, of course.)
Your assuming that scrum/agile/management won't take this over?
What stakeholder is prioritizing any of those things and paying for it out of their budget?
Code improvement projects are the White Whale of software engineering - obsessed over but rarely from a business point of view worth it.
The costs for code improvement projects have gone down dramatically now that we have power tools. So, perhaps it will be considered more worthwhile now? But how this actually plays out for professional programming is going to depend on company culture and management.
In my case, I'm an early-retired hobbyist programmer, so I control the budget. The same is true for any open source project.
And what happens when these different objectives conflict or diverge ? Will it be able to figure out the appropriate trade-offs, live with the results and go meta to rethink the approach or simply delude itself ? We would definitely lose these skills if it continues like this.
> Unless you are only ever a single engineer, your career is filled with "I need to debug code I didn't write".
That's the vast majority of my job and I've yet to find a way to have LLMs not be almost but not entirely useless at helping me with it.
(also, it's filled with that even when you are a single engineer)
I hope you realize that means your position is in danger.
It would be in danger if LLMs could actually do that for me, but they're still very far from it and they progress slowly. One day I could start worrying, but it's not today.
And even if you are the single engineer, I'll be honest, it might as well have been somebody else that wrote the code if I have to go back to something I did seven years ago and unearth wtf.
It's nice that AI can fix bugs fast, but it's better to not even have bugs in the first place. By using someone else's battle tested code (like a framework) you can at least avoid the bugs they've already encountered and fixed.
I spent Dry January working on a new coding project and since all my nerd friends have been telling me to try to code with LLM's I gave it a shot and signed up to Google Gemini...
All I can say is "holy shit, I'm a believer." I've probably got close to a year's worth of coding done in a month and a half.
Busy work that would have taken me a day to look up, figure out, and write -- boring shit like matplotlib illustrations -- they are trivial now.
Things that are ideas that I'm not sure how to implement "what are some different ways to do this weird thing" that I would have spend a week on trying to figure out a reasonable approach, no, it's basically got two or three decent ideas right away, even if they're not perfect. There was one vectorization approach I would have never thought of that I'm now using.
Is the LLM wrong? Yes, all the damn time! Do I need to, you know, actually do a code review then I'm implementing ideas? Very much yes! Do I get into a back and forth battle with the LLM when it gets starts spitting out nonsense, shut the chat down, and start over with a newly primed window? Yes, about once every couple of days.
It's still absolutely incredible. I've been a skeptic for a very long time. I studied philosophy, and the conceptions people have of language and Truth get completely garbled by an LLM that isn't really a mind that can think in the way we do. That said, holy shit it can do an absolute ton of busy work.
What kind of project / prompts - what’s working for you? /I spent a good 20 years in the software world but have been away doing other things professionally for couple years. Recently was in the same place as you, with a new project and wanting to try it out. So I start with a generic Django project in VSCode, use the agent mode, and… what a waste of time. The auto-complete suggestions it makes are frequently wrong, the actions it takes in response to my prompts tend to make a mess on the order of a junior developer. I keep trying to figure out what I’m doing wrong, as I’m prompting pretty simple concepts at it - if you know Django, imagine concepts like “add the foo module to settings.py” or “Run the check command and diagnose why the foo app isn’t registered correctly” Before you know it, it’s spiraling out of control with changes it thinks it is making, all of which are hallucinations.
I'm just using Gemini in the browser. I'm not ready to let it touch my code. Here are my last two prompts, for context the project is about golf course architecture:
Me, including the architecture_diff.py file: I would like to add another map to architecture_diff. I want the map to show the level of divergence of the angle of the two shots to the two different holes from each point. That is, when your are right in between the two holes, it should be a 180 degree difference, and should be very dark, but when you're on the tee, and the shot is almost identical, it should be very light. Does this make sense? I realize this might require more calculations, but I think it's important.
Gemini output was some garbage about a simple naive angle to two hole locations, rather than using the sophisticated expected value formula I'm using to calculate strokes-to-hole... thus worthless.
Follow up from me, including the course.py and the player.py files: I don't just want the angle, I want the angle between the optimal shot, given the dispersion pattern. We may need to update get_smart_aim in the player to return the vector it uses, and we may need to cache that info. We may need to update generate_strokes_gained_map in course to also return the vectors used. I'm really not sure. Take as much time as you need. I'd like a good idea to consider before actually implementing this.
Gemini output now has a helpful response about saving the vector field as we generate the different maps I'm trying to create as they are created. This is exactly the type of code I was looking for.
Also >20 years in software. The VSCode/autocomplete, regardless of the model, never worked good for me. But Claude Code is something else - it doesn't do autocomplete per se - it will do modifications, test, if it fails debug, and iterate until it gets it right.
I recently started building a POC for an app idea. As framework I choose django and I did not once wrote code myself. The whole thing was done in a github codespace with copilot in agentic mode and using mostly sonnet and opus models. For prompting, I did not gave it specific instructions like add x to settings. I told it "We are now working on feature X. X should be able to do a, b and c. B has the following constraints. C should work like this." I have also some instructions in the agents.md file which tells the model to, before starting to code, ask me all unclear questions and then make a comprehensive plan on what to implement. I would then go over this plan, clarify or change if needed - and then let it run for 5-15 minutes. And every time it just did it. The whole thing, with debugging, with tests. Sure, sometimes there where minor bugs when I tested - but then I prompted directly the problem, and sure enough it got fixed in seconds...
Not sure why we had so different experiances. Maybe you are using other models? Maybe you miss something in your prompts? Letting it start with a plan which I can then check did definitly help a lot. Also a summary of the apps workings and technical decissions (also produced by the model) did maybe help in the long run.
I'm (mostly) a believer too, and I think AI makes using and improving these existing frameworks and libraries even easier.
You mentioned matplotlib, why does it make sense to pay for a bunch of AI agents to re-invent what matplotlib does and fix bugs that matplotlib has already fixed, instead of just having AI agents write code that uses it.
I mean, the thesis of the post is odd. I'll grant you that.
I work mostly with python (the vast majority is pure python), flask, and htmx, with a bit of vanilla js thrown in.
In a sense, I can understand the thesis. On the one hand Flask is a fantastic tool, with a reasonable abstraction given the high complexity. I wouldn't want to replace Flask. On the otherhand HTMX is a great tool, but often imperfect for what I'm exactly trying to do. Most people would say "well just just React!" except that I honestly loathe working with js, and unless someone is paying me, I'll do it in python. I could see working with an LLM to build a custom tool to make a version of HTMX that better interacts with Flask in the way I want it to.
In fact, in my project I'm working on now I'm building complex heatmap illustrations that require a ton of dataprocessing, so I've been building a model to reduce the NP hard aspects of that process. However, the illustrations are the point, and I've already had a back and forth with the LLM about porting the project into HTML, or some web based version of illustration at least, simply because I'd have much more control over the illustrations. Right now, matplotlib still suits me just fine, but if I had to port it, I could see just building my own tool instead of finding an existing framework and learning it.
Frameworks are mostly useful because of group knowledge. I learn Flask because I don't want to build all these tools from scratch, and because I makes me literate in a very common language. The author is suggesting that these barriers -- at least for your own code -- functionally don't exist anymore. Learning a new framework is about as labor intensive as learning one you're creating as you go. I think it's short-sighted, yes, but depending on the project, yea when it's trivial to build the tool you want, it's tempting to do that instead learning to use a similar tool that needs two adapters attached to it to work well on the job you're trying to do.
At the same time, this is about scope. Anyone throwing out React because they want to just "invent their own entire web framework" is just being an idiot.
Because frameworks don’t have bugs? Or unpredictable dependency interactions?
This is generous, to the say the least.
In practice using someone else’s framework means you’re accepting the risk of the thousands of bugs in the framework that have no relevance to your business use case and will never be fixed.
> better to not have bugs in the first place
you must have never worked on any software project ever
Have you? Then you know that the amount of defects scales linearly with the amount of code. As things stand models write a lot more code than a skilled human for a given requirement.
> Unless you are only ever a single engineer, your career is filled with "I need to debug code I didn't write".
True, but there's usually at least one person who knows that particular part of the system that you need to touch, and if there isn't, you'll spend a lot of time fixing that bug and become that person.
The bet you're describing is that the AI will be the expert, and if it can be that, why couldn't it also be the expert at understanding the users' needs so that no one is needed anywhere in the loop?
What I don't understand about a vision where AI is able to replace humans at some (complicated) part of the entire industrial stack is why does it stop at a particular point? What makes us think that it can replace programmers and architects - jobs that require a rather sophisticated combination of inductive and deductive reasoning - but not the PMs, managers, and even the users?
Steve Yegge recently wrote about an exponential growth in AI capabilities. But every exponential growth has to plateau at some point, and the problem with exponential growth is that if your prediction about when that plateau happens is off by a little, the value at that point could be different from your prediction by a lot (in either direction). That means that it's very hard to predict where we'll "end up" (i.e. where the plateau will be). The prediction that AI will be able to automate nearly all of the technical aspects of programming yet little beyond them seems as unlikely to me as any arbitrary point. It's at least as likely that we'll end up well below or well above that point.
Believe the re:Invent session is this one but correct me if I'm wrong: https://www.youtube.com/watch?v=rMPe622eGY0
I think back on the ten+ years I spent doing SRE consulting and the thing is, finding the problems and identifying solutions — the technical part of the work — was such a small part of the actual work. So often I would go to work with a client and discover that they often already knew the problem, they just didn’t believe it - my job was often about the psychology of the organization more than the technical knowledge. So you might say “Great, so the agent will automatically fix the problem that the organization previous misidentified.” That sounds great right up until it starts dreaming… it’s not to say there aren’t places for these agents, but I suspect ultimately it will be like any other technology we use where it becomes part of the toolkit, not the whole.
Was that session published online somewhere? I’d love to watch that.
That doesn't change the fact that the submission is basically repeating the LISP curse. Best case scenario: you end up with a one-off framework and only you know how it works. The post you're replying to points out why this is a bad idea.
It doesn't matter if you don't use 90% of a framework as the submission bemoans. When everyone uses an identical API, but in different situations, you find lots of different problems that way. Your framework, and its users become a sort of BORG. When one of the framework users discovers a problem, it's fixed and propagated out before it can even be a problem for the rest of the BORG.
That's not true in your LISP curse, one off custom bespoke framework. You will repeat all the problems that all the other custom bespoke frameworks encountered. When they fixed their problem, they didn't fix it for you. You will find those problems over and over again. This is why free software dominates over proprietary software. The biggest problem in software is not writing the software, it's maintaining it. Free software shares the maintenance burden, so everyone can benefit. You bear the whole maintenance burden with your custom, one off vibe coded solutions.
> one to do the work and submit PRs to your repo
Have we not seen loads of examples of terrible AI generated RPs every week on this site?
Because nobody posts the good ones. They're boring, correct, you merge them and move on to the next one. It's like there's a murder in the news every day but generally we're still all fine.
Don't assume that when people make fun of some examples that there aren't thousands more that nobody cares to write about.
Automatically solving software application bugs is one thing, recovering stateful business process disasters and data corruption is entirely another thing.
Customer A is in an totally unknown database state due to a vibe-coded bug. Great, the bug is fixed now, but you're still f-ed.