Hacker News

zarzavat a day ago [ - ]

Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.

If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.

If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.

If anything, "truthy" code is more mentally taxing to review than just obviously bad code.

xantronix a day ago [ - ]

I know there are good uses of LLMs out there. I do. But.

The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.

At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?

Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?

steventhedev 10 hours ago [ - ]

The dirty secret if you work inside BigCorp and look around at the projects they're showcasing:

1. They're low stakes to get wrong.

2. The most common is MCPs or similar ai-tooling.

3. Making them look good takes time and effort still. It's a multiplier, not a replacement.

4. Quality and maintainability require investment. I had to restart an agentic project several times because it painted itself into a corner.

user34283 19 hours ago [ - ]

In my opinion you are just wrong.

It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.

Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.

Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.

The results are now undeniable. In the past I could not have developed a product of that scope in my free time.

That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.

atomicnumber3 19 hours ago [ - ]

"many engineers have not yet tried things that became feasible over the last months"

I have heard this statement every single day for 2 years and yet we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".

passivepinetree 17 hours ago [ - ]

Well, the GP mentioned

> if it’s a solo greenfield project

which is a pretty large caveat. Anecdotally, I've found my side projects (which are solo greenfield projects, and don't need to be supported to the same standards as enterprise software) have gained the boost the GP was talking about.

At work, it's different, since design, review, and maintenance is much more onerous.

simonw 17 hours ago [ - ]

If you want an example of a project that condensed 5 years into 6 months and exploded past the competition I suggest looking at OpenClaw.

The first line of code was written on November 25th. It achieved adoption in the "personal agents" space that far exceeded the other companies that had tried the same thing.

(Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)

mjr00 17 hours ago [ - ]

OpenClaw is definitely not a "5 years" project pre-AI though. That was more like a month of greenfield work compressed into a weekend -- which is still really impressive, don't get me wrong! -- but I think the point is we're not seeing mature, legacy codebases get outcompeted by new, agile, AI-driven codebases; we're seeing greenfield projects get spun up faster. Which, again, is still impressive and valuable.

If agents could really compress 10 years of development into 1 year, you'd see people making e.g. HFT platforms and becoming obscenely rich, not making a fun open-source project and getting hired by OpenAI as an employee.

simonw 7 hours ago [ - ]

41,964 commits is a lot more than "a month of greenfield work".

https://tools.simonwillison.net/github-repo-stats?repo=OpenC...

sdevonoes 3 hours ago [ - ]

Didn’t we learn anything from the past? Using loc or number of commits or github stars to measure success or productivity is so backwards. It seems everyone on the AI wagon is either young (and so they don’t know our history) or simply forgot about all the good practices in software engineering

krater23 an hour ago [ - ]

My bashscript can do that in some hours. The git repo contains no working software after that, but when that is what you want to meassure...

mjr00 6 hours ago [ - ]

> 41,964 commits is a lot more than "a month of greenfield work".

I meant a month for the initial release, not current state.

Regardless, much like lines of code, number of commits is not a good metric, not even as a proxy, for how much "work" was actually done. Quickly browsing there are plenty[0] of[1] really[2] small[3] commits[4]. Agentic coding naturally optimizes for small commits because that's what the process is meant to do, but it doesn't mean that more work is being done, or that the work is effective. If anything, looking at the changelog[5] OpenClaw feels like a directionless dumpster fire right now. I would expect a lot more from a project if it had multiple people working on it for 5 years, pre-AI.

[0] https://github.com/openclaw/openclaw/commit/e43ae8e8cd1ffc07...

[1] https://github.com/openclaw/openclaw/commit/377c69773f0a1b8e...

[2] https://github.com/openclaw/openclaw/commit/ffafa9008da249a0...

[3] https://github.com/openclaw/openclaw/commit/506b0bbaad312454...

[4] https://github.com/openclaw/openclaw/commit/512f777099eb19df...

[5] https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md

simonw 5 hours ago [ - ]

That's why my original comment said:

> (Whether or not you trust the quality of the software you can't deny the impact it had in such a short time. It defined a new category of software.)

I brought up OpenClaw here because the challenge was:

> we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".

timr 6 hours ago [ - ]

Seriously? Commit count is right up there with lines of code as a classically dumb measurement of productivity.

simonw 5 hours ago [ - ]

Sure, but it's still a good counter to "a month of work".

timr 5 hours ago [ - ]

No it isn't. There's basically no upper bound on the number of commits an LLM can generate. If the LLM takes 10,000 commits to do what a human would do in 10, then the comparison is meaningless.

I don't know anything about the code quality of OpenClaw, but telling me the number of commits tells me precisely nothing of use.

simonw an hour ago [ - ]

OK, now do that for 369,293 stars, 76,193 forks, 138 releases and 2,133 contributors.

I expect there is no number I could bring up here that won't be instantly shot down as telling "precisely nothing". My mistake for bringing up any numbers at all.

OpenClaw is a good example of a completely new project written using coding agents that made a significant impression on the world and would not have been built without them.

I'm surprised this is a hill I have to die on, but there we are.

(I'm not even a user of OpenClaw! I don't think it's secure or safe enough to use in my own life.)

sdevonoes 3 hours ago [ - ]

It isn’t man. Anyone can easily split a single good commit into 10 just to inflate the numbers. C’mon, this is 101 git

thunky 16 hours ago [ - ]

You're framing it like the only barrier to writing wildly successful money printing software is software development skills.

If that were true, all of these anti-AI greybeards who have been in the game for 30 years would all own their own jets.

mlsu 17 hours ago [ - ]

Ideally, the given example would be something not ajacent to the presently white-hot category of "AI agents".

Like, look at e.g. YC minus the AI and AI ajacent companies. Are those startups meaningfully more impressive or feature-rich as compared to a couple years ago?

simonw 12 hours ago [ - ]

Not yet, no. I think that's because coding agents got good in November, most people didn't notice until January and it still takes 3-4 months to go from idea to releasing something.

I expect we will start seeing the impact of the new coding agent enhanced development processes over the next few months.

demorro 16 hours ago [ - ]

> It defined a new category of software

Which is exactly why you can't use it as an example, there is no control. This is basic stuff.

cbarte01 5 hours ago [ - ]

The condensation argument is totally true.... Strikes me though the other metric Id look at is how long code survives before being re-written. Feels like for that one a bit early to tell...

layer8 16 hours ago [ - ]

I don’t see OpenClaw making much of an impact. Maybe in your bubble?

simonw 7 hours ago [ - ]

There are credible reports of regular people in China attending dedicated events for help getting started with OpenClaw. They're not in my bubble!

https://www.reuters.com/technology/openclaw-enthusiasm-grips...

retinaros 17 hours ago [ - ]

Its trash vibecoded markdown files around pi. This exemplifies well what op’is saying. We are at the ICO stage of llms. Hopefully there wont be an nft one

mschuster91 16 hours ago [ - ]

As much as I love to hate on AI: even the bad apples still produce something that one can reasonably work with.

Cryptocurrencies? Barely any other use than money laundering, buying drugs and betting on the outcome of battles in war. And NFTs? No use at all other than money laundering and setting money ablaze.

gck1 15 hours ago [ - ]

Privacy and security from government overreach is not enough?

mschuster91 6 hours ago [ - ]

What privacy? Enough drug dealers have already been busted with solid evidence from trailing the paths on public blockchains.

xantronix 13 hours ago [ - ]

The thing is, I don't care any longer. I sincerely believe velocity without direction is not a good strategy for delivering quality in the long term. And that's the thing about it: How sustainable is this velocity, in terms of socioeconomic concerns, product strategy, and mental health?

user34283 an hour ago [ - ]

Velocity without direction?

I‘m personally directing and QA testing every feature.

I don’t know how socioeconomic concerns, product strategy, and mental health are a concern for me here.

I‘m having a great time with my project and it’s been the most fun I‘ve had in many years of building.

heavyset_go 5 hours ago [ - ]

All of the "solo green field projects" I let LLMs mostly write, despite supplying the scaffolding, structure and specific implementation details as code, prompts or context, I can't tell you much about 6+ months later, except for the parts I did write.

It's like I never wrote them, because I didn't. I've got the gist of them, but it's the same way I get the gist of something like Numpy: I know how it works theoretically, but certainly not specifically enough to jump in and write some working Fortran that fixes bugs or adds features.

I now have a bunch of stalled projects I'm not very familiar with. I no longer do solo green field projects that way.

nananana9 12 hours ago [ - ]

> and it can now multiply your productivity fivefold if it’s a solo greenfield project.

Why do I not see 5x as many interesting greenfield projects than before?

valcron1000 16 hours ago [ - ]

> if it’s a solo greenfield project

That's a big if. I don't have numbers but most professional engineers are not working on such projects

Daishiman 21 hours ago [ - ]

There's two sides to the AI mandates.

The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.

There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.

callc 19 hours ago [ - ]

Ah yes, we must force these obstinate engineers to the right path! Only after getting everyone to see the light will they understand and thank us for boundless productivity!! /s

Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!

To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.

empthought 17 hours ago [ - ]

Engineers that didn't move past src.v35.final.zip version control don't really have jobs today, either.

jaggederest 17 hours ago [ - ]

You would be absolutely shocked how many software projects are still run, to this day, without source control at all. Or automated (or manual) testing. And how many hand crafted artisanal servers are running on AWS, never to be recovered if their EC2 instance is killed for some reason.

zbentley 16 hours ago [ - ]

Sure, but that’s a small and shrinking market. Not a source of economic security or growth for its employees, nor for most of its companies (though some have defended niches).

jaggederest 16 hours ago [ - ]

I've seen growing companies running multiple million ARR through systems like that. It's way more common than you'd think if you're a professional software developer.

seanw444 15 hours ago [ - ]

I seriously don't see how version control and LLMs are comparable. A deterministic way to track code changes over time, versus an essentially non-deterministic statistical code generator that might get you what you want, and might do it in a reasonable time frame, and that might not land you in a minefield of short-term-good/long-term-bad design points.

foldr 6 hours ago [ - ]

> an essentially non-deterministic statistical code generator that might get you what you want, and might do it in a reasonable time frame, and that might not land you in a minefield of short-term-good/long-term-bad design points.

Sounds like a human? The ‘statistical’ part is arguable, I suppose.

xantronix 12 hours ago [ - ]

There is an absolute embarrassment of modern tooling in other categories I have no problem whatsoever embracing. I'm not a holdout for being stuck in my ways. Maybe I value things other than expediency at massive cost. Maybe I speak just as well to computers as I do to humans.

I'm sure I will have no problem whatsoever remaining in the employ of a firm that trusts me to make products and tooling that still push the envelope of what's possible without having to resort to the sheer brute force of trillion parameter-scale models.

Daishiman 10 hours ago [ - ]

There is no massive cost. For 80% of the brute work that needs to be done day in and day out LLMs provide code as good as a senior engineer provided you have sufficient competency in steering the model, but done at breakneck pace.

jaggederest 15 hours ago [ - ]

Around the turn of the century there were the same exact arguments being made about automated testing (not just TDD, but any automated tests at all!)

Daishiman 16 hours ago [ - ]

I ran the statistics myself and my company is spending 40% less time doing feature development since AI agents began to be used en masse and pushing 50% more tickets without any noticeable increase in regressions.

After 18 months the hard evidence is in place. And much like replacing bare-metal servers for many use cases where evidence shows that the burden of k8s or the substitution of shell scripts for Terraform, it's time to move on.

I don't really see a place for no AI usage in line-of-business software apps anymore.

svieira 13 hours ago [ - ]

What did you use to fill the time you aren't doing feature development in with? Or are you all now working 20 hour work weeks?

Daishiman 10 hours ago [ - ]

Faster feature development, more strategic thinking in how to keep the dev pipeline full, doing braindead mechanical improvements that pay off tech debt that would have otherwise not have management sign-off to justify, writing GUI-based tools for support teams that previously had to scour reams of shell scripts, spending more time on refining specifications and estimations, writing throwaway concepts of different design ideas in order to have better architetuce discussions based on real code instead of pseudocode, clearing out the backlog of bugs that used to be terribly annoying to reproduce and that now I can just throw brute compute for resolving.

sdevonoes 3 hours ago [ - ]

Sounds awful. Just filling the time with worthless stuff. You are basically a liability. Wouldn’t like to have you in my team. Less is more (nowadays more than ever)

hintymad 16 hours ago [ - ]

> I don't think that AIs have become more trustworthy, the errors are just more subtle.

Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?

A difference to me is that when we manually write code, we reason about the code carefully with a purpose. Yes we do make mistakes, but the mistakes are grounded in a certain range. In contrast, AI generated code creates errors that do not follow common sense. That said, I don't feel this differentiation is strong enough, and I don't have data to back it up.

chromacity 15 hours ago [ - ]

One answer, as another person pointed out, is that LLM mistakes are just different. They are less explicable, less predictable, and therefore harder to spot. I can easily anticipate how an inexperienced engineer is going to mess up their first pull request for my project. I have no idea what an LLM might do. Worse, I know it might ace the first fifty pull requests and then make an absolutely mind-boggling mistake in the 51st one.

But another answer is that human autonomy is coupled to responsibility. For most line employees, if they mess up badly enough, it's first and foremost their problem. They're getting a bad performance review, getting fired, end up in court or even in prison. Because you bear responsibility for your actions, your boss doesn't have to watch what you're up to 24x7. Their career is typically not on the line unless they're deeply complicit in your misbehavior.

LLMs have no meaningful responsibility, so whoever is operating them is ultimately on the hook for what they do. It's a different dynamic. It's probably why most software engineers are not gonna get replaced by robots - your director or VP doesn't want to be liable for an agent that goes haywire - but it's also why the "oh, I have an army of 50 YOLO agents do the work while I'm browsing Reddit" is probably not a wise strategy for line employees.

wilsonnb3 15 hours ago [ - ]

> I can easily anticipate how an inexperienced engineer is going to mess up their first pull request for my project.

Isn’t this just because you have seen a lot of PRs from inexperienced engineers? People learn LLM behavior over time, too.

chromacity 12 hours ago [ - ]

I'm pretty sure that I've seen more LLM mistakes than coworker mistakes at this point and I'm nowhere closer to enlightenment.

sumeno 16 hours ago [ - ]

Humans can't make mistakes at the sheer scale that AI can.

Yes, as an engineer I make mistakes, but I could never make as many mistakes per day as an LLM can

16 hours ago [ - ]

[deleted]

throwuxiytayq 15 hours ago [ - ]

Obviously, the measure isn’t mistakes per day, it’s mistakes per LOC. And that’s not the whole story either - AI self-corrects in addition to being corrected by the operator. If the operator’s committed bugs/LOC rate is as low as the unaugmented programmer’s bugs/LOC, you always choose the AI operator. If it’s higher, it might still be viable to choose them if you care about velocity more than correctness. I’m a slow, methodical programmer myself, but it’s not clear to me that I have a moat.

philipwhiuk 3 hours ago [ - ]

> Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?

We're investing in the human getting better rather than paying $100 to Anthropic and hoping that's enough that they don't make the product worse.

BoorishBears 16 hours ago [ - ]

This is like having a coworker who's as skilled as you if not more skilled, but also an alien.

Their mental model doesn't map cleanly enough to yours, and so where for a human you'd have some way to follow their thought patterns and identify mistakes, here the alien makes mistakes that don't add up.

Like the alien has encyclopedic knowledge of op codes in some esoteric soviet MCU but sometimes forgets how to look for a function definition, says "It looks like the read tool failed, that's ok, I can just make a mock implementation and comment out the test for now."

AndrewKemendo 15 hours ago [ - ]

Some of my favorite peer engineers work exactly like that

People used to like them and they used to be legends (even if not everyone liked them)

Notch, Woz, Linus and Geohot come to mind

The Metasploit creator Dean McNamee worked for me and he was just like that and a total monster at engineering hard tech products

BoorishBears 14 hours ago [ - ]

No they don't because they have brains.

I have no strong idea why people can't accept that intelligence formed separately of a human brain can truly be alien: not in the hyperbolic sense of "that person is so unique it's like they're a different species", but "that thing does not have a brain, so it can have intelligence that is not human-like".

A human without a brain would die. An LLM doesn't have a brain and can do wonderous things.

It just does them in ways that require first accepting that there is no homo sapien thinks like an LLM.

We trained it on human language so often times it borrows our thought traces so to speak, but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.

It's like the early days of agents when everyone thought if you just made an agent for each job role in a company and stuck them in a virtual office handing off work to each other it'd solve everything, but then Claude Code took off and showed that a simple brain dead loop could outperform that.

Now subagents almost always are task specific, not role specific.

I feel like we could leap ahead a decade if people could divorce "we use language, and it uses language so it is like us", but I think there's just something really challenging about that because it's never been true.

Nothing had this level of mastery over human language before that wasn't a human. And funnily enough, the first times we even came close (like Eliza) the same exact thing happened: so this seems like a persistent gap in how humans deal with non-humans using language.

tyyyy3 13 hours ago [ - ]

"I feel like we could leap ahead a decade if people could divorce "we use language, and it uses language so it is like us","

Or maybe just maybe... the thing should be much better designed around the human.

That's how personal computers made their way into homes. People like yourself are comical and can't understand how widespread adoption takes place to obtain value from what the thing intrinsically possesses.

Firms literally exist to take care of the hassle so that the person can get the value from the thing closer to the present - like hello...?

BoorishBears 11 hours ago [ - ]

You quote me then start speaking about things completely unrelated to anything I said.

We can't choose if the LLM is like us unless you want to go back 10-20 years in time and choose a new direction for AI/ML.

We stumbled upon an architecture with mostly superficial similarities to how we think and learn, and instead focused on being able to throw more compute and more data at our models.

You're talking about ergonomics that exist at a completely different layer: even if you want to make LLM based products for humans, around humans, you have to accept it's not a human and it won't make mistakes like a human (even if the mistakes look human) -

If anything you're going to make something that burns most people if you just blindly pretend it's human-like: a great example being products that give users a false impression of LLM memory to hide the nitty gritty details.

In the early days ChatGPT would silently truncate the context window at some point and bullshit its way through recalling earlier parts of the conversation.

With compaction it does better, but still degrades noticeably.

If they'd exposed the concept of a context window to the user through top level primitives (like being able to manage what's important for example), maybe it'd have been a bit less clean of a product interface... but way more laypeople today would have a much better understanding of an LLM's very un-human equivalent to memory.

Instead we still give users lossy incomplete pictures of this all with the backends silently deciding when to compact and what information to discard. Most people using the tools don't know this because they're not being given an active role in the process.

AndrewKemendo 12 hours ago [ - ]

I think these are reasonable questions but it assumes that everything is actually a black box instead of being treated as such.

Despite what the headlines say, these systems aren’t inscrutable.

We know how these things work and can build around and within and change parameters and activation functions etc…and actually use experience and science and guidance.

However those are not technical problems those are organizational social and quite frankly resource allocation problems.

BoorishBears 12 hours ago [ - ]

I said the opposite of what your comment is replying to.

> but effective agentic systems form when you first erase your preconceived notions of how intelligence works and actually study this non-human intelligence and find new ways to apply it.

There's no reason you can't make good use of them and learn how to do it more reliably and predictably, it's just chasing those gains through a human intelligence-like model because they use human language leads to more false starts and local maxima than trying to understand stand them as their owb systems.

I don't think it should even be a particularly contentious point: we humans think differently based on the languages we learn and grew up with, what would you expect when you remove the entire common denominator of a human brain?

wilsonnb3 15 hours ago [ - ]

Dealing with the alien coworkers has always been the job, that is what software is to most people.

Software developers get paid big money because they can speak alien, the only thing that is changing is the dialect.

BoorishBears 14 hours ago [ - ]

Nope, I tried my best to be really detailed and already knew these replies would come flooding.

I'm an engineers engineer: I get the job isn't LOC but being able to communicate and translate meatspace into composable and robust sustems.

So when I mean an alien when I say an alien.

Not human.

Not in the cute "oh that guy just hears what everyone else hears and somehow interprets it entirely differently like he's from a different planet" alien way, but in the, "it is a different definition of intelligence derived from lacking wetware" alien way.

Intelligence is such multidimensional concept that all of humanity as varied as we are, can fit in a part of the space that has no overlap with an LLM.

Now none of that is saying it can't be incredibly useful, but 99% of the misuse and misunderstanding of LLMs stems from humans refusing to internalize that a form of intelligence can exist that uses their language but doesn't occupy the same "space" of thinking that we all operate in, no matter how weird or unqiue we think we are.

asdfman123 16 hours ago [ - ]

You can direct LLMs to do test-driven development, though. Write several tests, then make sure the code matches it. And also make sure the agent organizes the code correctly.

CharlieDigital 15 hours ago [ - ]

The LLM obliges and writes a lot of useless tests. I have asked devs to delete several tests in the last day alone.

seanw444 15 hours ago [ - ]

"I don't trust this giant statistical model to generate correct code, so to fix it, I'm going to have this giant statistical model generate more code to confirm that the other code it generated is correct."

I swear I'm living through mass hysteria.

hyperadvanced 10 hours ago [ - ]

A lot of times the act of specifying test criteria prevents developers from accidentally vibe coding themselves into a bad implementation. You can then read the tests and verify that it does what you want it to. You can read the code!

I’m not saying that it’s all hunky dory, but you use AI for straight up test driven development to catch edge cases and correct sloppy implementations before they even get coded by your giant chaos machine.

asdfman123 9 hours ago [ - ]

Well, yeah, you don't just make it bang out a bunch of useless code without monitoring it.

You instruct it to write the code you want to be written. You still have to know how to develop, it just makes you faster.

sanderjd 15 hours ago [ - ]

Yeah I relate to this. I think working in smaller chunks helps a lot. (Just like how it is for work done by humans!)

christoff12 a day ago [ - ]

This has generally been the case, though. As mentioned in the post, "You want solutions that are proven to work before you take a risk on them" remains true and will be place where the edges are found.

zarzavat 21 hours ago [ - ]

It's about responsibility.

If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.

The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.

The buck stops with me and therefore I have to read the code, line-by-line, carefully.

It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.

You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.

jaggederest 16 hours ago [ - ]

I think there's a couple levels here:

First of all, building a system that constrains the output of the AI sufficiently, whether that's typing, testing, external validation, or manual human review in extremis. That gets you the best result out of whatever harness or orchestration you're using.

Secondly, there's the level at which you're intervening, something along the hierarchy of "validate only usage from the customer perspective" to "review, edit, and validate every jot and tiddle of the codebase and environment". I think for relatively low importance things reviewing at the feature level (all code, but not interim diffs) is fine, but if you're doing network protocol you better at least validate everything carefully with fuzzing and prop testing or something like that.

And then you've got how you structure your feedback to the LLM itself - is it an in-the-loop chat process, an edit-and-retry spec loop, go-nogo on a feature branch, or what? How does the process improve itself, basically?

I agree with you entirely that the responsibility rests on the human, but there are a variety of ways to use these things that can increase or decrease the quality of code to time spent reviewing, and obviously different tasks have different levels of review scrutiny, as well.

user34283 19 hours ago [ - ]

On the other hand, I don’t need to review carefully every line of code in my thumbnail generator and associated UI.

My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.

After the QA testing on my device, a quick scroll through of the code is enough.

Maybe prompt „are errors during thumbnail generation caught to prevent app crashes?“ if we‘re feeling extra cautious today.

And just like that it saved a day of work.

parliament32 an hour ago [ - ]

I assume you're talking about a local application? You don't care if a malicious image you downloaded pwns your PC then? Like CVE-2016-3714

jaggederest 17 hours ago [ - ]

> My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.

Hmm. Historically image editing was one of the easier to exploit security holes in many systems. How do you feel about having unknown entities having shell inside your datacenter or vpc?

user34283 15 hours ago [ - ]

I feel pretty good about the odds of attackers exploiting security holes in image editing functions my app does not have, in order to enter my also nonexistent datacenter or vpc.

djhn 9 hours ago [ - ]

But a thumbnail generator is a 1 hour task at best if you’re on a solo greenfield project and it’ll still be a 6 week project at an enterprise, even with AI.

user34283 5 hours ago [ - ]

I would be impressed if you implement it in an hour with the following features:

- webview fallback with canvas capture for codecs not supported in the default player

- detecting blank frames and diff between thumbnails to maximize variety

- UI integration to visualize progress and pending thumbnails, batched updates to the gallery

- versioning scheme and backfill for missing/outdated thumbnail formats

Honestly, a day seems rather optimistic to me. Maybe if I was an expert for this platform and would have implemented a similar feature before, then I could hope to do it in a day.

If I had to handwrite it and estimate it for Scrum at work, I‘d budget a week.

djhn 4 hours ago [ - ]

Ok, fair. I incorrectly assumed you meant resizing static images to create a lower resolution preview image.

Video thumbnails are a different beast altogether. And you might want to double check your assumptions about security considerations. If any of your ffmpeg, opencv, pyscenedetect code is running on your server, it might well be exploitable.

user34283 4 hours ago [ - ]

It’s in-app on iOS.

Ironically, already another user in this comment section was concerned about the security of my nonexistent backend.

But it’s good to know, I was not previously aware that video processing on the backend is a common source of vulnerabilities.