> One thing I’ve noticed is that different people get wildly different results with LLMs, so I suspect there’s some element of how you’re talking to them that affects the results.
It's always easier to blame the prompt and convince yourself that you have some sort of talent in how you talk to LLMs that other's don't.
In my experience the differences are mostly in how the code produced by the LLM is reviewed. Developers who have experience reviewing code are more likely to find problems immediately and complain they aren't getting great results without a lot of hand holding. And those who rarely or never reviewed code from other developers are invariably going to miss stuff and rate the output they get higher.
This definitely is the case. I was talking to someone complaining about how llms don't work good.
They said it couldn't fix an issue it made.
I asked if they gave it any way to validate what it did.
They did not, some people really are saying "fix this" instead of saying "x fn is doing y when someone makes a request to it. Please attempt to fix x and validate it by accessing the endpoint after and writing tests"
Its shocking some people don't give it any real instruction or way to check itself.
In addition I get great results doing voice to text with very specific workflows. Asking it to add a new feature where I describe what functions I want changed then review as I go vs wait for the end.
> Its shocking some people don't give it any real instruction or way to check itself.
It's not shocking. The tech world is telling them that "Claude will write all of their app easily" with zero instructions/guidelines so of course they're going to send prompts like that.
I think the implications of limited to no instructions are a little to way off depending on what you're doing... CRUD APIs, sure... especially if you have a well defined DB schema and API surface/approach. Anything that might get complex, less so.
Two areas I've really appreciated LLMs so far... one is being able to make web components that do one thing well in encapsulation.. I can bring it into my project and just use it... AI can scaffold a test/demo app that exercises the component with ease and testing becomes pretty straight forward.
The other for me has been in bridging rust to wasm and even FFI interfaces so I can use underlying systems from Deno/Bun/Node with relative ease... it's been pretty nice all around to say the least.
That said, this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place. Actually reviewing code and especially the written test logic. LLMs tend to over-test in ways that are excessive or redundant a lot of the time. Especially when a longer test function effectively also tests underlying functionalities that each had their own tests... cut them out.
There's nothing "free" and it's not all that "easy" either, assuming you actually care about the final product. It's definitely work, but it's more about the outcome and creation than the grunt work. As a developer, you'll be expected to think a lot more, plan and oversee what's getting done as opposed to being able to just bang out your own simple boilerplate for weeks at a time.
It's surprising they don't learn better after their first hour or two of use. Or maybe they do know better but don't like the thing so they deliberately give it rope to hang itself with, then blame overzealous marketting.
If you tell a human junior developer just "fix this" then they will spend a week on a wild-goose chase with nothing to show for it.
At least the LLM will only take 5 minutes to tell you they don't know what to do.
Do they? I’ve never got a response that something was impossible, or stupid. LLMs are happy to verify that a noop does nothing, if they don’t know how to fix something. They rather make something useless than really tackle a problem, if they can make tests green that way, or they can claim that something “works”.
And’ve I never asked Claude Code something which is really impossible, or even really difficult.
Claude code will happily tell me my ideas are stupid, but I think that's because I nest my ideas in between other alternative ideas and ask for an evaluation of all of them. This effectively combats the sycophantic tendencies.
Still, sometimes claude will tell me off even when I don't give it alternatives. Last night I told it to use luasocket from an mpv userscript to connect to a zeromq Unix socket (and also implement zmq in pure lua) connected to an ffmpeg zmq filter to change filter parameters on the fly. Claude code all but called me stupid and told me to just reload the filter graph through normal mpv means when I make a change. Which was a good call, but I told it to do the thing anyway and it ended up working well, so what does it really know... Anyway, I like that it pushes back, but agrees to commit when I insist.
After such hard-won wins, ask the AI to save what it learned during the session to a MD file.
To be fair, that happening feels more like poor management and mentorship than "juniors are scatterbrained".
Over time, you build up the right reflexes that avoid a one-week goose chase with them. Heck, since we're working with people, you don't just say " fix this", you earmark time to make sure everyone is aligned on what needs done and what the plan is.
> At least the LLM will only take 5 minutes to tell you they don't know what to do.
In my experience, the LLM will happily try the wrong thing over and over for hours. It rarely will say it doesn’t know.
Don’t ask it to make changes off the bat, then - ask it to make a plan. Then inspect the plan, change it if necessary, and go from there.
I do. I tend to follow a strict Research, Plan, Implement workflow. It does greatly help, but it doesn’t eliminate all problems.
An LLM might take 5 minutes, or 20 minutes, and still do the wrong thing. Rarely have I seen an LLM not "know what to do." A coworker told it to fix some unit tests, it churned away for a while, then changed a bunch of assert status == 200 to 500. Good news, tests pass now!
There are subtler versions of this too. I've been working on a TUI app for a couple of weeks, and having great success getting it to interactively test by sending tmux commands, but every once in a while it would just deliver code that didn't work. I finally realized it was because the capture tools I gave it didn't capture the cursor location, so it would, understandably, get confused about where it was and what was selected.
I promptly went and fixed this before doing any more work, because I know if I was put in that situation I would refuse to do any more work until I could actually use the app properly. In general, if you wouldn't be able to solve a problem with the tools you give an LLM, it will probably do a bad job too.
Yeah, the more time I spend in planning and working through design/api documentation for how I want something to work, the better it does... Similar for testing against your specifications, not the code... once you have a defined API surface and functional/unit tests for what you're trying to do, it's all the harder for AI to actually mess things up. Even more interesting is IMO how well the agents work with Rust vs other languages the more well defined your specifications are.
> some people really are saying "fix this" instead of saying "x fn is doing y when someone makes a request to it. Please attempt to fix x and validate it by accessing the endpoint after and writing tests"
This works about 85% of the time IME, in Claude Code. My normal workflow on most bugs is to just say “fix this” and paste the logs. The key is that I do it in plan mode, then thoroughly inspect and refine the plan before allowing it to proceed.
Untested Hypothesis: LLM instruction is usually an intelligence+communication-based skill. I find in my non-authoritative experience that users who give short form instructions are generally ill prepared for technical motivation (whether they're motivating LLMs or humans).
lol that is still “how you’re talking to them that affects the results” just more specific
Feeding the LLM a "copy as cURL" for its feedback loop instead of letting it manage the dev server was an unlock for me.
I have 30 years of experience delivering code and 10 years of leading architecture. My argument is the only thing that matters is does the entire implementation - code + architecture (your database, networking, your runtime that determines scaling, etc) meet the functional and none functional requirements. Functional = does it meet the business requirements and UX and non functional = scalability, security, performance, concurrency, etc.
I only carefully review the parts of the implementation that I know “work on my machine but will break once I put in a real world scenario”. Even before AI I wasn’t one of the people who got into geek wars worrying about which GOF pattern you should have used.
All except for concurrency where it’s hard to have automated tests, I care more about the unit or honestly integration tests and testing for scalability than the code. Your login isn’t slow because you chose to use a for loop instead of a while loop. I will have my agents run the appropriate tests after code changes
I didn’t look at a line of code for my vibe coded admin UI authenticated with AWS cognito that at most will be used by less than a dozen people and whoever maintains it will probably also use a coding agent. I did review the functionality and UX.
Code before AI was always the grind between my architectural vision and implementation
Explain how fragility of implementation, like spaghetti code, high coupling low cohesion fit into your world view?
As human developers, I think we're struggling with "letting go" of the code. The code we write (or agents write) is really just an intermediate representation (IR) of the solution.
For instance, GCC will inline functions, unroll loops, and myriad other optimizations that we don't care about (and actually want!). But when we review the ASM that GCC generates we are not concerned with the "spaghetti" and the "high coupling" and "low cohesion". We care that it works, and is correct for what it is supposed to do.
Source code in a higher-level language is not really different anymore. Agents write the code, maybe we guide them on patterns and correct them when they are obviously wrong, but the code is just the work-item artifact that comes out of extensive specification, discussion, proposal review, and more review of the reviews.
A well-guided, iterative process and problem/solution description should be able to generate an equivalent implementation whether a human is writing the code or an agent.
A compiler uses rigorous modeling and testing to ensure that generated code is semantically equivalent. It can do this because it is translating from one formal language to another.
Translating a natural prompt on the other hand requires the LLM to make thousands of small decisions that will be different each time you regenerate the artifact. Even ignoring non-determinism, prompt instability means that any small change to the spec will result in a vastly different program.
A natural language spec and test suite cannot be complete enough to encode all of these differences without being at least as complex as the code.
Therefore each time you regenerate large sections of code without review, you will see scores of observable behavior differences that will surface to the user as churn, jank, and broken workflows.
Your tests will not encode every user workflow, not even close. Ask yourself if you have ever worked on a non trivial piece of software where you could randomly regenerate 10% of the implementation while keeping to the spec without seeing a flurry of bug reports.
This may change if LLMs improve such that they are able to reason about code changes to the degree a human can. As of today they cannot do this and require tests and human code review to prevent them from spinning out. But I suspect at that point they’ll be doing our job, as well as the CEOs and we’ll have bigger problems.
I don't see a world where a motivated soul can build a business from a laptop and a token service as a problem. I see it as opportunity.
I feel similarly about Hollywood and the creation of media. We're not there in either case yet, but we will be. That's pretty clear. and when I look at the feudal society that is the entertainment industry here, I don't understand why so many of the serfs are trying to perpetuate it in its current state. And I really don't get why engineers think this technology is going to turn them into serfs unless they let that happen to them themselves. If you can build things, AI coding agents will let you build faster and more for the same amount of effort.
I am assuming given the rate of advance of AI coding systems in the past year that there is plenty of improvement to come before this plateaus. I'm sure that will include AI generated systems to do security reviews that will be at human or better level. I've already seen Claude find 20 plus-year-old bugs in my own code. They weren't particularly mission critical but they were there the whole time. I've also seen it do amazingly sophisticated reverse engineering of assembly code only to fall over flat on its face for the simplest tasks.
That depends on how fast that change happens. If 45% of jobs evaporate in a a 5 year period, a complete societal collapse is the likely outcome.
Sounds like influencer nonsense to me. Touch grass. If the people are fed and housed, there's no collapse. And if the billionaire class lets them starve, they will finally go through some things just like the aristocracy in France once did. And I think even Peter Thiel is smarter than that. You can feed yourself for <$1000 a year on beans and rice. Not saying you'd enjoy it, but you won't starve. So for ~$40B annually, the billionaires buy themselves revolution insurance. Fantastic value.
OTOH if what you're really talking about is the long-term collapse in our ludicrous carbon footprint when we finally run out of fossil fuels and we didn't invest in renewables or nuclear to replace them, well, I'm with you there.
>Sounds like influencer nonsense to me. Touch grass.
I don't even know what this means.
The worst unemployment during the Weimar Republic was 25-30%. Unemployment in the Great Depression peaked at 25%.
So yeah if we get to 45% unemployment and those are the highest paying jobs on average then yeah it's gonna be bad. Then you add in second order effects where none of those people have the money to pay the other 55% who are still employed.
We might get to a UBI relatively quickly and peacefully. But I'm not betting on it.
>finally go through some things just like the aristocracy in France once did.
Yeah that's probably the most likely scenario, but that quickly devolved into a death and imprisonment for far more than the aristocrats and eventually ended with Napoleon trying to take over Europe and millions of deaths overall.
The world didn't literally end, but it was 40 years of war, famine, disease, and death, and not a lot of time to think about starting businesses with your laptop.
And the dark ages lasted a millennium. Sounds like quite an improvement on that. And if America didn't want a society hellbent on living the worst possible timeline, why did it re-elect President Voldemaga and give him the football? And then, even when he breaks nearly every political promise, his support remains better than his predecessor? Anyway, I think the richest ~1135 Americans won't let you starve, but they'll be happy to watch you die young of things that had stopped killing people for quite some time whilst they skim all the cream. And that seems to be what the plurality wants or they'd vote differently.
The good news is that America is ~5% of the world. And the more we keep punching ourselves in the face, the better the chance someone else pulls ahead. But still, we have nukes, so we're still the town bully for the immediate future.
What are you even arguing about? I have absolutely no idea where you are going with this.
Yeah I figured that. You think society is going to collapse because of AI. I don't. But I do think that stupid narrative is prevalent in the media right now and the C-suite happily proclaiming they're going to lay people off and replace them with AI got the ball rolling in the first place. Now it has momentum of its own with lunatics like Eliezer Yudkowsky once again getting taken seriously.
Fortunately, the other 95% of humanity is far less doomer about their prospects. So if America wants to be the new neanderthals, they'll be happy to be the new cro magnons.
I don't think society is going to collapse because of AI because I don't think the current architectures have any chance of becoming AGI. I think that if AGI is even something we're capable of it's very far off.
I think that if CEOs can replace us soon, it's because AGI got here much sooner than I predicted. And if that happens we have 2 options Mad Max and Star Trek and Mad Max is the more likely of the 2.
What's with all the catastrophic thinking then? Mad Max? Collapse of Society because 45% unemployment? I really hate people on principle but I have more faith in them looking out for their own self interest than you do apparently. Mad Max specifically requires a ridiculous amount of intact infrastructure for all the gasoline (you know gasoline goes bad in 3-6 months? Yeah didn't think so), manufacturing for all the parts for all those crazy custom build road warrior wagons, and ranches of livestock for all the leather for all the cool outfits (and with all that cow, no one needs to starve but oh the infrastructure needed to keep the cows fed).
If doom porn is your thing, try watching Threads or The Day After, especially Threads. That said, I don't think Star Trek is possible, maybe The Expanse but more likely we run out of cheap energy before we get off world.
As for the AGI, it all depends on your definition. We're already at Amazon IC1/IC2 coding performance with these agents (I speak from experience previously managing them). If we get to IC3, one person will be able to build a $1B company and run it or sell it. If you're a purist like me and insist we stick to douchebag racist Nick Bostrom's superintelligence definition of AGI, then we agree. But I expect 24/7 IC3 level engineering as a service for $200/month to be more than enough and I think that's a year or two away. And you can either prepare for that or scream how the sky is falling, your choice.
>Mad Max specifically requires a ridiculous amount of intact infrastructure for all the gasoline (you know gasoline goes bad in 3-6 months? Yeah didn't think so)
Is this a joke or do you have a learning disability?
>But I expect 24/7 IC3 level engineering as a service for $200/month to be more than enough and I think that's a year or two away. And you can either prepare for that or scream how the sky is falling, your choice.
Or I could do neither and write you off as a gasbag who doesn't know what he's talking about like all the other ex-amazon management I've had the pleasure to work with over the years.
I guess you have a really short context buffer with all this frequently forgetting things you've said yourself.
But that aside, how's all that self-righteousness working out for you?
I bet you have ex-Amazon prominently in your LinkedIn profile.
Don't have a LinkedIn profile, don't need one. But I'm guessing you're listed under LinkedIn Lunatics.
I read back through a few of your posts and you’re either schizophrenic, or a very elaborate troll.
I know a few older people who started posting like this when they hit their 50s. I’ve only got a few years left. Hopefully I can avoid it, but maybe it’s inevitable.
Ageism: now that's a warrior's flex, amIRight?
People like myself in their 50s to 60s who had the experience of banging the metal on imperfect buggy hardware late into the night to mine gems before Python made the entire software engineering community pivot to a core competency of syntax pedanting plus stringing library calls together are having a real party with AI agents effectively doing the same thing they did 30 years ago. I personally never stopped coding even through my one awful experience as an engineering manager.
But you do you, and hear me now, dismiss me later. There won't be 45% unemployment because the minute AI starts replacing current engineering skills for real is the minute the people it targets wake up and start learning how to work with AI coding agents that will be dramatically better than today. People resist change until there are no other options, just look at fossil fuels. The free market will work that one out too eventually.
And no amount of some nontechnical guy vibe scienceing his way to a working mRNA vaccine for his cancer-ridden dog or an engineer unlocking mods to Disney Infinity just from the binary and Claude Code or an entire web browser ported to rust will ever convince you these things are not the enemy. And that's going to put you through some things down the road. So of course, since this will never happen I'm an elaborate troll or a nutcase just like the people who pulled all those things off, never mind all the evidence mounting that these things can be amazing in the right hands. That's CRAZYTALK! Stochastic Parrot! Glorified Autocomplete! Mad Max! Mad Max! DLSS 5!
* https://en.wikipedia.org/wiki/Bullshit_Jobs
>You can feed yourself for <$1000 a year on beans and rice. Not saying you'd enjoy it, but you won't starve. So for ~$40B annually, the billionaires buy themselves revolution insurance. Fantastic value.
You are the epitome of the tech bro.
Sure, sure. Understanding how these sociopaths think clearly makes me a tech bro rather than someone who incorporates worst-case scenarios into my planning. Suggesting they would maintain minimum viable society to save their own asses means I'm in favor of it, right? This is why I work remotely.
Peter Thiel might be smarter than that but I’m not sure about the other ones.
Look how Musk treated the Twitter devs or Bezos any of his workers or Trump anybody.
They're all quite intelligent. And they're world class experts in saving their own bacon. Doesn't mean they have any ethics though nor any emotional intelligence after decades of being surrounded by toadies and bootlickers.
Smart is not equal to intelligent.
You can be very intelligent but have a blind eye on some trivial things.
I’m certain that some of them think they are untouchable (or even just are well prepared). We will only see if that’s really true if shit hits the fan.
We all know they have bunkers and we roughly know where they are. I got suspended on reddit for threatening harm to others for saying that a couple weeks back. But I don't think we need to raid the bunkers in your TEOTWAWKI scenario, their bodyguards will do all the heavy-lifting once they realize the power balance has shifted. But I also don't expect a SHTF scenario, just a slow creeping enshitification of living standards instead of actually implementing a UBI.
And then the survivors who band together to rebuild community instead of chasing some idiotic Mad Max scenario will ultimately prevail. And yes, they are blind to that other option because they wouldn't end up on top.
>If you can build things, AI coding agents will let you build faster and more for the same amount of effort.
But you aren't building, your LLM is. Also, you are only thinking about ways as you, a supposed builder, will benefit from this technology. Have you considered how all previous waves of new technologies have introduced downstream effects that have muddied our societies? LLMs are not unique in this regard, and we should be critical on those who are trying to force them into every device we own.
Would you say the general contractor for your home isn’t a builder because he didn’t install the toilets?
I think this argument would be make more sense if you were talking about an architect, or the customer.
A contractor is still very much putting the house together.
The general contractor is not doing the actual building as much as he is coordinating all of the specialist, making sure things run smoothly and scheduling things based on dependencies and coordinating with the customer. I’ve had two houses built from the ground up
3 myself and I have yet to meet a "vibe" contractor.
And he is also not inspecting every screw, wire, etc. He delegates
Oh you're preaching to the choir. I think we are entering a punctuated equilibrium here w/r to the future of SW engineering. And the people who have the free time to go on to podcasts and insist AI coding agents can't do anything useful rather than learning their abilities and their limitations and especially how to wield them are going to go through some things. If you really want to trigger these sorts, ask them why they delegate code generation to compilers and interpreters without understanding each and every ISA at the instruction level. To that end, I am devoid of compassion after having gone through similar nonsense w/r to GPUs 20 years ago. Times change, people don't.
I haven’t stayed relevant and able to find jobs quickly for 30 years by being the old man shouting at the clouds.
I started my career in 1996 programming in C and Fortran on mainframes and got my first only and hopefully last job at BigTech at 46 7 jobs later.
I’m no longer there. Every project I’ve had in the last two years has had classic ML and then LLMs integrated into the implementation. I have very much jumped on the coding agent bandwagon.
Started mine around the same time and yes, keeping up keeps one employed. What's disheartening however is how little keeping up the key decision makers and stakeholders at FAANNG do and it explains idiocy like already trying to fire engineers and replace them with AI. Hilarity ensued of course because hilarity always ensues for people like that, but hilarity and shenanigans appears to be inexhaustible resources.
I very much would rather get a daily anal probe with a cactus than ever work at BigTech again even knowing the trade off that I now at 51 make the same as 25 year old L5 I mentored when they were an intern and their first year back as an L4 before I left.
If you have FIRE money, getting off the hamster wheel of despair that is tech industry culture is the winning move. Well-played.
Not quite FIRE money. I still need to work for awhile - I just don’t need to chase money. I make “enough” to live comfortably, travel like I want (not first class.), save enough for retirement (max out 401K + catchup contributions + max out HSA + max out Roth).
We did choose to downsize and move to state tax free Florida.
If I have to retire before I’m 65, exit plan is to move to Costa Rica (where we are right now for 6 weeks)
I think that's precisely his thinking and don't let him know about all those fancy expensive unitasker tools they have that you probably don't that let them do it far more cost effectively and better than the typical homeowner. Won't you think of the jerbs(tm)? And to Captain dystopia, life expectencies were increasing monotonically until COVID. Wonder what changed?
I've struggled a bit with this myself. I'm having a paradigm shift. I used to say "but I like writing code". But like the article says, that's not really true. I like building things, the code was just a way to do that. If you want to get pedantic, I wasn't building things before AI either, the compiler/linker was doing that for me. I see this is just another level of abstraction. I still get to decide how things work, what "layers" I want to introduce. I still get to say, no, I don't like that. So instead of being the "grunt", I'm the designer/architect. I'm still building what I want. Boilerplate code was never something I enjoyed before anyway. I'm loving (like actually giggling) having the AI tie all the bits for me and getting up and running with things working. It reminds me of my Delphi days: File->New Project, and you're ready to go. I think I was burnt out. AI is helping me find joy again. I also disable AI in all my apps as well, so I'm still on the fence about several things too.
This resonates. I spent years thinking I enjoyed coding, but what I actually enjoy is designing elegant solutions built on solid architecture. Inventing, innovating, building progressively on strong foundations. The real pleasure is the finished product (is it ever really finished though?) — seeing it's useful and makes people's lives easier, while knowing it's well-built technically. The user doesn't see that part, but we know.
With AI, by always planning first, pushing it to explore alternative technical approaches, making it explain its choices — the creative construction process gets easier. You stay the conductor. Refactoring, new features, testing — all facilitated. Add regular AI-driven audits to catch defects, and of course the expert eye that nothing replaces.
One thing that worries me though: how will junior devs build that expert eye if AI handles the grunt work? Learning through struggle is how most of us developed intuition. That's a real problem for the next generation.
> A compiler uses rigorous modeling and testing to ensure that generated code is semantically equivalent.
Here are the reported miscompilation bugs in GCC so far in 2026. The ones labeled "wrong-code".
https://gcc.gnu.org/bugzilla/buglist.cgi?chfield=%5BBug%20cr...
I count 121 of them.
If you can’t understand the difference between a bug that will rarely cause a compiler encountering an edge case to generate a wrong instruction and an LLM that will generate 2 completely different programs with zero overlap because you added a single word to your prompt, then I don’t know what to tell you.
The point is that expert humans (the GCC developers) writing code (C++) that generates code (ASM) does not appear to be as deterministic as you seem to think it is.
I’m very aware of that, but I’m also aware that it’s rare enough that the compiler doesn’t emit semantically equivalent code that most people can ignore it. That’s not the case with LLMs.
I’m also not particularly concerned with non-determinism but with chaos. Determinism in LLMs is likely solvable, prompt instability is not.
Classic HN-ism. To focus on the semantics of a statement while ignoring the greater point in order to argue why someone is wrong.
I think it's a perfectly fine point. The OP said (my interpretation) that LLMs are messy, non-deterministic, and can produce bad code. The same is true of many humans, even those whose "job" is to produce clean, predictable, good code. The OP would like the argument to be narrowly about LLMs, but the bigger point even is "who generates the final code, and why and how much do we trust them?"
As of right now agents have almost no ability to reason about the impact of code changes on existing functionality.
A human can produce a 100k LOC program with absolute no external guardrails at all. An agent Can't do that. To produce a 100k LOC program they require external feedback forcing them from spiraling off into building something completely different.
This may change. Agents may get better.
As if when you delegate tasks to humans they are deterministic. I would hope that your test cases cover the requirements. If not, your implementation is just as brittle when other developers come online or even when you come back to a project after six months.
1. Agents aren’t humans. A human can write a working 100k LOC application with zero tests (not saying they should but they could and have). An agent cannot do this.
Agents require tests to keep them from spinning out and your tests do not cover all of the behaviors you care about.
2. If you doubt that your tests don’t cover all your requirements, 99.9% of every production bug you’ve ever had completely passed your test suite.
Valid points. But crucial part of not "letting go" of the code is because we are responsible for that code at the moment.
If, in the future, LLM providers will take ownership of our on-calls for the code they have produced, I would write "AUTO-REVIEW-ACCEPTER" bot to accept everything and deploy it to production.
If, company requires me to own something, then I should be aware about what's that thing and understand ins and outs in detail and be able to quickly adjust when things go wrong
You are comparing compilers to a completely non deterministic code generation tool that often does not take observable behavior into account at all and will happily screw a part of your system without you noticing, because you misworded a single prompt.
No amount of unit/integration tests cover every single use case in sufficiently complex software, so you cannot rely on that alone.
I've actually found that well-written well-documented non-spaghetti code is even more important now that we have LLMs.
Why? Because LLMs can get easily confused, so they need well written code they can understand if the LLM is going to maintain the codebase it writes.
The cleaner I keep my codebase, and the better (not necessarily more) abstracted it is, the easier it is for the LLM to understand the code within its limited context window. Good abstractions help the right level of understanding fit within the context window, etc.
I would argue that use of LLMs change what good code is, since "good" now means you have to meaningfully fit good ideas in chunks of 125k tokens.
I somewhat agree. But that’s more about modularity. It helps when I can just have Claude code focus on one folder with its own Claude file where it describes the invariants - the inputs and outputs.
If you don’t read the code how the heck do you know anything about modularity? How do you know that Module A doesn’t import module B, run the function but then ignore it and implement the code itself? How do you even know it doesn’t import module C?
Claude code regularly does all of these things. Claude code really really likes to reimplement the behavior in tests instead of actually exercising the code you told it to btw. Which means you 100% have to verify the test code at the very least.
Well I know because my code is in separately deployed Lambdas that are either zip files uploaded to Lambda or Docker containers run on Lambda that only interact via APi Gateway, a lambda invoke, SNS -> SQS to Lambda, etc and my IAM roles are narrowly defined to only allow Lambda A to interact with just the Lambdas I tell it to.
And if Claude tried to use an AWS service in its code that I didn’t want it to use, it would have to also modify the IAM IAC.
In some cases the components are in completely separate repositories.
It’s the same type of hard separation I did when there were multiple teams at the company where I was the architect. It was mostly Docker/Fargate back then.
Having separately defined services with well defined interfaces does an amazing job at helping developers ramp up faster and it reduces the blast radius of changes. It’s the same with coding agents. Heck back then, even when micro services shared the same database I enforced a rule that each service had to use a database role that only had access to the tables it was responsible for.
I have been saying repeatedly I focus on the tests and architecture and I mentioned in another reply that I focus on public interface stability with well defined interaction points between what I build and the larger org - again just like I did at product companies.
There is also a reason the seven companies I went into before consulting (including GE when it was still a F10 company) I was almost always coming into new initiatives where I could build/lead the entire system from scratch or could separate out the implementation from the larger system with well defined inputs and outputs. It wasn’t always micro services. It might have been separate packages/namespaces with well defined interfaces.
Yeah my first job out of college was building data entry systems in C from scratch for a major client that was the basis of a new department for the company.
And it’s what Amazon internally does (not Lambda micro services) and has since Jeff Bezos’s “API Mandate” in 2002.
This sounds like an absolute hellscape of an app architecture but you do you. It also doesn’t stop anything but the Module A imports C without you knowing about it. It doesn’t stop module A from just copy pasting the code from C and saying it’s using B.
>almost always coming into new initiatives
That says a lot about why you are so confident in this stuff.
Yes microservice based architecture is something no modern company does…
Including the one that you were so confident doesn’t do it even though you never worked there…
Yet I don’t suffer from spooky action at a distance and a fear of changes because my testing infrastructure is weak…
Either I know what I’m doing or I’ve bullshitted my way into multiple companies into hiring me to lead architecture and/or teams from 60 person startups to the US’s second largest employer.
Did I mention that one of those companies was the company that acquired the startup I worked for before going to BigTech reached out to me to be the architect overseeing all of their acquisitions and try to integrate them based on the work I did? I didn’t accept the offer. I’ve done the “work for a PE owned company that was a getting bigger by a acquiring other companies and lead the integration thing before”
So they must have been impressed with the long term maintenance of the system to ask me back almost four years after I left
If the only evidence you have that your software is maintainable is that a company once asked you to come back, and you have no actual experience maintaining large applications with millions of users, you essentially have data to base any of your claims on.
You may have 30 years experience architecting new applications, but when it comes to maintaining large applications, you’re a neophyte.
If you don’t have first hand experience with what long term maintenance looks like for your creations, you don’t have any reason to be telling anyone how to write maintainable software.
If I were you I’d be suffering from imposter syndrome big time. What if you’re just a really good salesman and bullshitter? If I were you I’d want to stick around at a few places to see first hand how my designs hold up.
That may be the future, but we're not there yet. If you're having the LLM write to a high level language, eg java, javascript, python, etc, at some point there will be a bug or other incident that requires a human to read the code to fix it or make a change. Sure, that human will probably use an LLM as part of that, but they'll still need be able to tell what the code is doing, and LLMs simply are not reliable enough yet that you just blindly have them read the code, change it, and trust them that it's correct, secure, and performant. Sure, you can focus on writing tests and specs to verify, but you're going to spend a lot more time going in agentic loops trying to figure out why things aren't quite right vs a human actually being able to understand the code and give the LLM clear direction.
So long as this is all true, then the code needs to be human readable, even if it's not human-written.
Maybe we'll get to the point that LLMS really are equivalent to compilers in terms of reliability -- but at that point, why would be have them write in Java or other human-readable languages? LLMs would _be_ a compiler at that point, with a natural-language UI, outputing some kind of machine code. Until then, we do need readable code.
Me: My code isn’t giving the expected result $y when I do $x.
Codex: runs the code, reproduces the incorrect behavior I described finds the bug, reruns the code and gets the result I told it I expected. It iterates until it gets it right and runs my other unit and integration tests.
This isn’t rocket science.
When requirements change, a compiler has the benefit of not having to go back and edit the binary it produced.
Maybe we should treat LLM generated code similarly —- just generate everything fresh from the spec anytime there’a a change, though personally I haven’t had much success with that yet.
This is fantasy completely disconnected from reality.
Have you ever tried writing tests for spaghetti code? It's hell compared to testing good code. LLMs require a very strong test harness or they're going to break things.
Have you tried reading and understanding spaghetti code? How do you verify it does what you want, and none of what you don't want?
Many code design techniques were created to make things easy for humans to understand. That understanding needs to be there whether you're modifying it yourself or reviewing the code.
Developers are struggling because they know what happens when you have 100k lines of slop.
If things keep speeding in this direction we're going to wake up to a world of pain in 3 years and AI isn't going to get us out of it.
I’ve found much more utility even pre AI in a good suite of integration tests than unit tests. For instance if you are doing a test harness for an API, it doesn’t matter if you even have access to the code if you are writing tests against the API surface itself.
I do too, but it comes from a bang-for-your-buck and not a test coverage standpoint. Test coverage goes up in importance as you lean more on AI to do the implementation IMO.
You did see the part about my unit, integration and scalability testing? The testing harness is what prevents the fragility.
It doesn’t matter to AI whether the code is spaghetti code or not. What you said was only important when humans were maintaining the code.
No human should ever be forced to look at the code behind my vibe coded internal admin portal that was created with straight Python, no frameworks, server side rendered and produced HTML and JS for the front end all hosted in a single Lambda including much of the backend API.
I haven’t done web development since 2002 with Classic ASP besides some copy and paste feature work once in a blue moon.
In my repos - post AI. My Claude/Agent files have summaries of the initial statement of work, the transcripts from the requirement sessions, my well labeled design diagrams , my design review sessions transcripts where I explained it to client and answered questions and a link to the Google NotebookLM project with all of the artifacts. I have separate md files for different implemtation components.
The NotebookLM project can be used for any future maintainers to ask questions about the project based on all of the artifacts.
> It doesn’t matter to AI whether the code is spaghetti code or not. What you said was only important when humans were maintaining the code.
In my experience using AI to work on existing systems, the AI definitely performs much better on code that humans would consider readable.
You can’t really sit here talking about architecting greenfield systems with AI using methodology that didn’t exist 6 months ago while confidently proclaiming that “trust me they’ll be maintainable”.
Well you can, and most consultants do tend to do that, but it’s not worth much.
I wasn’t born into consulting in 1996. AI for coding is by definition the worse today that it will ever be. What makes you think that the complexity of the code will increase faster than the capability of the agents?
You might have maintained large systems long ago, but if you haven't done it in a while your skill atrophies.
And the most important part is you haven't maintained any large systems written by AI, so stating that they will work is nonsense.
I won't state that AI can't get better. AI agents might replace all of us in the future. But what I will tell you is based on my experience and reasoning I have very strong doubts about the maintainability of AI generated code that no one has approved or understands. The burden of proof isn't on the person saying "maybe we should slow down and understand the consequences before we introduce a massive change." It's on the person saying "trust me it will work even though I have absolutely no evidence to support my claim".
Well seeing that Claude code was just introduced last year - it couldn’t have been that long since I didn’t code with AI.
And did I mention I got my start working in cloud consulting as a full time blue badge, RSU earning employee at a little company you might have heard of based in Seattle? So since I have worked at the second largest employee in the US, unless you have worked for Walmart - I don’t think you have worked for a larger company than I have.
Oh did I also mention that I worked at GE when it was #6 in market cap?
These were some of the business requirements we had to implement for the railroad car repair interchange management software
https://www.rmimimra.com/media/attachments/2020/12/23/indust...
You better believe we had a rigorous set of automated tests in something as highly regulated with real world consequences as the railroad transportation industry. AI would have been perfect for that because the requirements were well documented and the test coverage was extreme.
And unless your experience coding is before 1986 when I was coding in assembly language in 65C02 as a hobby, I think I might have a wee bit more than you.
I think you should probably save your “I have more experience” for someone who hasn’t been doing this professionally for 30 years for everything from startups, to large enterprises, to BigTech.
>Well seeing that Claude code was just introduced last year - it couldn’t have been that long since I didn’t code with AI.
That's my entire point!
>And unless your experience coding is before 1986 when I was coding in assembly language in 65C02 as a hobby, I think I might have a wee bit more than you.
Yeah a real wee bit. I started in the late 80s in Tandy Basic.
>I think you should probably save your “I have more experience” for someone who hasn’t been doing this professionally for 30 years for everything from startups, to large enterprises, to BigTech.
I never said anything about having more experience than you, but I've been doing this almost as long as you have. Also at everywhere from startups to large enterprises to BigTech.
But relevant to the discussion at hand, I haven't been consulting for the last part of my career where I could just lob something over the fence and walk away before I have to deal with the consequences of my decisions. This is what seems to be coloring your experience.
> Well you can, and most consultants do tend to do that
Yeah they do.
I'm familiar enough with the claims to feel confident there is plenty of nefarious astroturfing occurring all over the web including on HN.
Indeed. Astro turfing posts have a particular smell to them.
In my experience, consulting companies typically have a bunch of low-to-medium skilled developers producing crap, so the situation with AI isn't much different. Some are better than others, of course.
Also developer UX, common antipatterns, etc
This “the only thing that matters about code is whether it meets requirements” is such a tired take and I can’t imagine anyone seriously spouting it has has had to maintain real software.
The developer UX are the markdown files if no developer ever looks at the code.
Whether you are tired of it or not, absolutely no one in your value you chain - your customers who give your company money or your management chain cares about your code beyond does it meet the functional and non functional requirements - they never did.
And of course whether it was done on time and on budget
As a consumer of goods, I care quite a bit about many of the “hows” of those goods just as much as the “whats”.
My home, which I own, for example, is very much a “what” that keeps me warm and dry. But the “how” of it was constructed is the difference between (1) me cursing the amateur and careless decision making of builders and (2) quietly sipping a cocktail on the beach, free of a care in the world.
“How” doesn’t matter until it matters, like when you put too much weight onto that piece of particle board IKEA furniture.
Do you know how every nail was put into your house? Does the general contractor?
I know where they fucked up and cost me thousands of dollars due to cutting corners during build-out and poor architectural decisions during planning. These kinds of things become very obvious during destructive inspection, which is probably why there are so many limitations on warranties; I digress.
He’s mildly controversial, but watch some @cyfyhomeinspections on YouTube to get a good idea of what you can infer of the “how” of building homes and how it affects homeowners. Especially relevant here because he seems to specialize in inspecting homes that are part of large developments where a single company builds out many homes very quickly and cuts tons of corners and makes the same mistakes repeatedly, kind of like LLM-generated code.
So you’re saying that whether it’s humans or AI - when you delegate something to others you have no idea whether it’s producing quality without you checking yourself…
> you have no idea whether it’s producing quality without you checking yourself
No, I can have some idea. For example, “brand perception”, which can be negatively impacted pretty heavily if things go south too often. See: GitHub, most recently.
I mean, there are already companies that have a negative reputation regarding software quality due to significant outsourcing (consultancies), or bloated management (IBM), or whatever tf Oracle does. We don’t have to pretend there’s a universe where software quality matters, we already live in one. AI will just be one more way to tank your company’s reputation with regards to quality, even if you can maintain profitability otherwise through business development schemes.
So as long as it is meeting the requirements of “it stays up consistently and doesn’t lose my code” you really don’t care how it was coded…
The same as I’ve been arguing about using an agent to do the grunt work of coding.
If GitHub’s login is slow, it isn’t because someone or something didn’t write SOLID code.
> So as long as it is meeting the requirements of “it stays up consistently and doesn’t lose my code” you really don’t care how it was coded…
I don’t think we’ll come to common ground on this topic due to mismatching definitions of fundamental concepts of software engineering. Maybe let’s meet again in a year or two and reflect upon our disagreement.
[dead]
I personally haven't made my my mind either way yet, but I imagine that a vibecoding advocate could say to you that maintaining code makes sense only when the code is expensive to produce.
If the code is cheap to produce, you don't maintain it, you just throw it away and regenerate.
If you have users, this only works if you have managed to encode nearly every user observable behavior into your test suite.
I’ve never seen this done even with LLMs. Not even close. And even if you did it, the test suite is almost definitely more complex than the code and will suffer from all the same maintainability problems.
I dunno, I have extensive experience reviewing code, and I still review all the AI generated code I own, and I find nothing to complain about in the vast majority of cases. I think it is based on "holding it right."
For instance, I've commented before that I tend to decompose tasks intended for AI to a level where I already know the "shape" of the code in my head, as well as what the test cases should look like. So reviewing the generated code and tests for me is pretty quick because it's almost like reading a book I've already read before, and if something is wrong it jumps out quickly. And I find things jumping out more and more infrequently.
Note that decomposing tasks means I'm doing the design and architecture, which I still don't trust the AI to do... but over the years the scope of tasks has gone up from individual functions to entire modules.
In fact, I'm getting convinced vibe coding could work now, but it still requires a great deal of skill. You have to give it the right context and sophisticated validation mechanisms that help it self-correct as well as let you validate functionality very quickly with minimal looks at the code itself.
"Holding it right" has been one of my biggest problems. Many times I find the output affected by prompt poisoning, and I have to throw away the entire context.
It's not skill with talking to an LLM, it's the users skill and experience with the problem they're asking the LLM to solve. They work better for problems the prompter knows well and poorly for problems the prompter doesn't really understand.
Try it yourself. Ask claude for something you don't really understand. Then learn that thing, get a fresh instance of claude and try again, this time it will work much better because your knowledge and experience will be naturally embedded in the prompt you write up.
Not only you understanding the how, but you not understanding the goal.
I often use AI successfully, but in a few cases I had, it was bad. That was when I didn't even know the end goal and regularly switched the fundamental assumptions that the LLM tried to build up.
One case was a simulation where I wanted to see some specific property in the convergence behavior, but I had no idea how it would get there in the dynamics of the simulation or how it should behave when perturbed.
So the LLM tried many fundamentally different approaches and when I had something that specifically did not work it immediately switched approaches.
Next time I get to work on this (toy) problem I will let it implement some of them, fully parametrize them and let me have a go with it. There is a concrete goal and I can play around myself to see if my specific convergence criterium is even possible.
LLMs massively reduce the cost of "let's just try this". I think trying to migrate your entire repo is usually a fool's errand. Figure out a way to break the load-bearing part of the problem out into a sub-project, solve it there, iterate as much as you like. Claude can give you a test gui in one or two minutes, as often as you like. When you have it reliably working there, make Claude write up a detailed spec and bring that back to the main project.
Claude is surprisingly good at GUI work I've been learning, not just getting stuff working but also creating reasonably tasteful and practical designs. Asking claude in the browser to mock up a GUI and then having claude code implement it is a surprisingly powerful workflow.
I’m far away from a web developer or a web designer. But I think I intuitively understand how to put myself in the shoes of the end user when it comes to UX.
I noticed that Claude is awful at understanding what makes good UX even as simple as something as if you have a one line input box and button that lets you submit the line of text, you should wire it up so a user can press return instead of pressing the button or thinking about them being able to tab through inputs in a decent order
yeah as it's not using its own flow you have to give it a bit of feedback. so it goes with any dev work... I think you underestimate how bad programmer uis are.
Yup, same sort of experience. If I'm fishing for something based on vibes that I can't really visualize or explain, it's going to be a slog. That said, telling the LLM the nature of my dilemma up front, warning it that I'll be waffling, seems to help a little.
I review most of the code I get LLMs to write and actually I think the main challenge is finding the right chunk size for each task you ask it to do.
As I use it more I gain more intuition about the kinds of problems it can handle on it's, vs those that I need to work on breaking down into smaller pieces before setting it loose.
Without research and planning agents are mostly very expensive and slow to get things done, if they even can. However with the right initial breakdown and specification of the work they are incredibly fast.
you are overestimating the skill of code review. Some people have very specific ways of writing code and solving problems which are not aligned what LLMs wrote, but doesn't mean it's wrong.
I know senior developers that are very radical on some nonsense patterns they think are much better than others. If they see code that don't follow them, they say it's trash.
Even so, you can guide the LLM to write the code as you like.
And you are wrong, it's a lot on how people write the prompt.
> you are overestimating the skill of code review.
“You are overestimating the skill of [reading, comprehending, and critically assessing code of a non-guaranteed quality]” is an absurd statement if you properly expand out what “code review” means.
I don’t care if you code review the CSS file for the Bojangles online menu web page, but you better be code reviewing the firmware for my dad’s pacemaker.
This whole back and forth with LLM-generated code makes me think that the marginal utility of a lot of code the strong proponents write is <1¢. If I fuck up my code, it costs our partners $200/hr per false alert, which obliterates the profit margin of using our software in the first place.
By far most of the code LLMs write is for crappy crud apps and webapps not pacemakers and rockets
We can capture enough reliability on what LLMs produce there by guided integration tests and UX tests along with code review and using other LLMs to review along with other strategies to prvent semantic and code drift
Do you know how much crap wordpress ,drupal and Joomla sites I have seen?
Just that work can be automated away
But Ive also worked in high end and mission critical delivery and more formal verification etc - that’s just moving the goalposts on what AI can do- it will get there eventually
Last year you all here were arguing AI Couldn’t code - now everyone has moved the goalposts to formal high end and mission critical ops- yes when money matters we humans are still needed of course - no one denying that- its the utility of the sole human developer against the onslaught of machine aided coding
This profession is changing rapidly- people are stuck in denial
> that’s just moving the goalposts on what AI can do- it will get there eventually
This is the nutshell of your argument. I’m not convinced. Technologies often hit a ceiling of utility.
Imagine a “progress curve” for every technology, x-axis time and y-axis utility. Not every progress curve is limitlessly exponential, or even linear - in fact, very few are. I would venture to guess that most technological progress actually mimics population growth curves, where a ceiling is hit based on fundamental restrictions like resource availability, and then either stabilizes or crashes.
I don’t think LLMs are the AI endgame. They definitely have utility, but I think your argument boils down to a bold prediction of limitless progress of a specific technology (LLMs), even though that’s quite rare historically.
I agree that LLM architecture might hit a ceiling (although the trajectory is still upward at present) but I meant Deep Learning in general
I do think there is a great deal of VC baiting hype in statements by Dario and Altman about ai coding but at the same time the progress has indeed been positive
We've finally proven or unlocked the secret to learning in machines - the only question is how fast that progress curve is - yes it might get stuck for a few years but I think this is really an inflection point that we’ve reached with these technologies
> Developers who have experience reviewing code are more likely to find problems immediately and complain they aren't getting great results without a lot of hand holding
this makes me feel better about the amount of disdain I've been feeling about the output from these llms. sometimes it popsout exactly what I need but I can never count on it to not go offrails and require a lot of manual editing.
Exactly my experience. Sometimes it's brilliant, sometimes it produces crap, often it produces something that's a step in the right direction but requires extra work, and often it switches between these different results, producing great results at first until it gets stuck and desperately starts spewing out increasingly weird garbage.
As a developer, you always have to check the code, and recognise when it's just being stupid.
Question: are you manually making those changes to the "stupid" code? I've been having success with Claude using skills. When I see something I wouldn't do I say what I would have done, ask it for why it did it they way it did, then have it update the skills with a better plan. It's like a rubber duck and I understand it better. I have it make the code improvements. Laughing as it goes off the rails is entertaining though.
I think that entirely disregarding the fundamental operation of LLMs with dismissiveness is ungrounded. You are literally saying it isn’t a skill issue while pointing out a different skill issue.
It is absolutely, unequivocally, patently false to say that the input doesn’t affect the output, and if the input has impact, then it IS a skill.
I think that code review experience is a big driver of success with the llms, but my take away is somewhat different. If you’ve spent a lot of time reviewing other people’s code you realize the failures you see with llms are common failures full stop. Humans make them too.
I also think reviewable code, that is code specifically delivered in a manner that makes code review more straightforward was always valuable but now that the generation costs have lowered its relative value is much higher. So structuring your approach (including plans and prompts) to drive to easily reviewed code is a more valuable skill than before.
I'm relatively forgiving on bugs that I kind of expect to have happen... just from experience working with developers... a lot of the bugs I catch in LLMs are exactly the same as those I have seen from real people. The real difference is the turn around time. I can stay relatively busy just watching what the LLM is doing, while it's working... taking a moment to review more solidly when it's done on the task I gave it.
Sometimes, I'll give it recursive instructions... such as "these tests are correct, please re-run the test and correct the behavior until the tests work as expected." Usually more specific on the bugs, nature and how I think they should be fixed.
I do find that sometimes when dealing with UI effects, the agent will go down a bit of a rabbit hole... I wanted an image zoom control, and the agent kept trying to do it all with css scaling and the positioning was just broken.. eventually telling it to just use nested div's and scale an img element itself, using CSS positioning on the virtual dom for the positioning/overflow would be simpler, it actually did it.
I've seen similar issues where the agent will start changing a broken test, instead of understanding that the test is correct and the feature is broken... or tell my to change my API/instructions, when I WANT it to function a certain way, and it's the implementation that is wrong. It's kind of weird, like reasoning with a toddler sometimes.
Also Claude (and possibly others) sometimes decide to build everything an obviously bad idea, shitty architecture then keeps doubling down into mess of a code. My realization is I need to be the manager architect, let it produce the plan then review and adjust the architecture. Once you get good control of architecture way may less bugs, and easier to fix. One final thing hook observability really early on and then force LLM to throw all exceptions instead of “safe fallbacks” which in practice means I will swallow everything a you. Will need look at all of the code every time there is bug.
I thought I try to debunk your argument with a food example. I am not sure I succeeded though. Judge for yourself:
It's always easier to blame the ingredients and convince yourself that you have some sort of talent in how you cook that others don't.
In my experience the differences are mostly in how the dishes produced in the kitchen are tasted. Chefs who have experience tasting dishes critically are more likely to find problems immediately and complain they aren't getting great results without a lot of careful adjustments. And those who rarely or never tasted food from other cooks are invariably going to miss stuff and rate the dishes they get higher.
In your example the one making the food is you. You would have to introduce a cooking robot for the analogy to match agentic coding.
Actually I would say it should be a cooking machine like. I am not too familiar with these machines however.
[dead]
I will still take a glance every once in a while to satisfy my curiosity, but I have moved past trying to review code. I was happy with the results frequently enough that I do not find it to be necessary anymore. In my experience, the best predictor is the target programming language. I fail to get much usable code in certain languages, but in certain others it is as if I wrote it myself every time. For those struggling to get good results, try a different programming language. You might be surprised.
> It's always easier to blame the prompt and convince yourself that you have some sort of talent in how you talk to LLMs that other's don't.
Well, it's easily the simplest explanation, right?
Unfortunately it is impossible to ascertain what is what from what we read online. Everyone is different and use the tools in a different way. People also use different tools and do different things with them. Also each persons judgement can be wildly different like you are saying here.
We can't trust the measurements that companies post either because truth isn't their first goal.
Just use it or don't use it depending on how it works out imo. I personally find it marginally on the positive side for coding
> complain they aren't getting great results without a lot of hand holding
This is what I don’t understand - why would I “complain” about “hand holding”? Why would I just create a Claude skill or analogue that tells the agent to conform to my preferences?
I’ve done this many times, and haven’t run into any major issues.
That seems to make sense. Any suggestions to improve this skill of reviewing code?
I think especially a number of us more junior programmers lack in this regard, and don't see a clear way of improving this skill beyond just using LLMs more and learning with time?
It's "easy". You just spend a couple of years reviewing PRs and working in a professional environment getting feedback from your peers and experience the consequences of code.
There is no shortcut unfortunately.
You improve this skill by not using LLMs more and getting more experienced as a programmer yourself. Spotting problems during review comes from experience, from having learned the lessons, knowing the codebase and libraries used etc.
Find another developer and pair/work together on a project. It doesn't need to be serious, but you should organize it like it is. So, a breakdown of tasks needed to accomplish the goal first. And then many pull requests into the source that can be peer reviewed.
It's always easier to blame the model and convince yourself that you have some sort of talent in reviewing LLM's work that others don't.
In my experience the differences are mostly in how the code produced by LLM is prompted and what context is given to the agent. Developers who have experience delegating their work are more likely to prevent downstream problems from happening immediately and complain their colleagues cannot prompt as efficiently without a lot of hand holding. And those who rarely or never delegated their work are invariably going to miss crucial context details and rate the output they get lower.
Never takes long for the “you’re holding it wrong” crowd to pop in.
That's a terrible reason for a mass consumer tool to fail, and a perfectly reasonable one for a professional power tool to fail
Partly true, but I think there's a real skill in catching subtle logic errors in generated code too not just prompting well. Both matter.
That's what I meant, though. I didn't mean "I say the right words", I meant "I don't give them a sentence and walk away".
I guess it's no coincidence that most of then people saying "LLMs are great for doing code" are non-developers...
It's also always easier to blame the LLM when the developer doesn't work with it right.
In my experience the differences are mostly between the chair and the keyboard.
I asked Codex to scrape a bunch of restaurant guides I like, and make me an iPhone app which shows those restaurants on a map color coded based on if they're open, closed or closing/opening soon.
I'd never built an iOS app before, but it took me less than 10 minutes of screen time to get this pushed onto my phone.
The app works, does exactly what I want it to do and meaningfully improves my life on a daily basis.
The "AI can't build anything useful" crowd consists entirely of fools and liars.
Garbage in, garbage out.