I feel like the current situation is temporary. LLMs are finding all the bugs. LLMs are also help fixing most of the bugs. Once most of the bugs are fixed, LLMs should be good at finding bugs before shipping them, the stream of bug reports will die down, and we'll be back to vulnerabiltiy reports being special.

Further, the fact that bugs are so easy to find by LLMs means there is strong incentives to find ways to minimize creating bugs in the first place. That could be new or better languages, less 3rd party dependencies, more vetted code, better linters, better fuzzers, whatever. The point the new reality of bugs being easy to find will, actually must, lead to less bugs eventually because the world can't function with easy to find bugs.

"Temporary" can be an awfully long time. There is ample evidence that discovery rate of bugs (many of which can be bucketed into vulnerabilities) in any non-trivial piece of software is more or less stable.[0] In a recent podcast episode the ex-CISO of Adobe commented that every now and then they'd take a sustained squeeze to find all occurrences of a given type of bug (ie. source of vulnerability) in a codebase. They'd find a good amount of them and fix them.

Then a year or two later they'd repeat the operation and they'd find about the same amount of same types of bugs. In many occasions in code that had been in place in the previous round and had remained essentially untouched.

Paraphrasing what the Gruqg has quipped - a large piece of software has infinity bugs. Infinity minus N is still infinity.

0: Discovery rate with regards to the time spent looking for bugs. LLM-powered bug hunting has amped up the speed with which code bases can be investigated.

Ahhh - you are talking about Adobe. I always wondered, given the never ending stream of vulnerabilities in their products, what it was about their development process that produced such appalling code in the first place.

The hope is that LLMs can scan my code every day or something like that. If I make a mistake and get it past code review, the LLM will still find it and it gets fixed right away. (better yet, make LLM an automatic reviewer on everything).

Many of the bugs we are finding in projects like curl are 20 years old - once they are fixed they are fixed and so hopefully we get all those 1-20 year old problems fixed and future scans only find new problems which is itself a big improvement in the rate. I agree that we will never reach a point where there are no bugs introduced, but we should strive to fix them faster.

[deleted]

I feel this sentiment is wishful thinking,but I want to start by saying I hope it turns out to be correct.

I find that often bugs will be created when using an LLM, like others have said. Saying that this can then be fixed by identifying all the bugs created by an LLM with an LLM doesnt guarantee another bug is not introduced when the LLM is addressing the initial problem.

Also, what if the LLM has a blind spot. They certainly also could be incapable of finding or fixing a bug. They dont pass any benchmark at 100% right now. Also also, guaranteeing there are no bugs in your code is like saying you have 100% test coverage, all of the tests pass, and they are written perfectly. Saying that you can simply identify and fix the bugs also assumes there is enough time and energy to find all of the bugs that exist within a project and then to address them. Even LLMs use time and energy. In a sufficiently complex system that is certainly wishful thinking.

Considering the size and complexity of a lot of modern software (like web browsers, 3d modelling software, game engines, etc.) software is just too complex to not have bugs even when created and managed by LLMs.

There will continue to be bugs in code and we will simply have to live with the fact that LLMs make it easier to exploit computer systems. I mean consider a hardware bug like Spectre [0]. If bugs like this become easier to find does that mean our existing hardware will just become obsolete more quickly? that type of problem can be addressed, but at quite a high cost.

Not sure what all of this means for the future.

0. https://en.wikipedia.org/wiki/Spectre_%28security_vulnerabil...

If LLMs can trivially find bugs, then they can trivially find bugs. If they can't find any bugs that doesn't mean there are no bugs but it suggests that others can't easily find them either. So the LLMs find all the bugs problem is fixed by asking the LLMs to find them before you ship.

Read what wrote, I didn't say your program will be bug free. I said, if the LLM can trivally find the bug it will. If it can't then we're at worst, back to the state of before LLMs could find bugs, but likely much better since we fixed so many of them

So, the fact that LLMs can trivially find bugs is enough to get the bugs fixed.

You, and several others, seemed to think I was saying LLMs would fix all the bugs. I never said that. I said they'd help. Finding them is help. Writing a possible fix is often help. Writing a possible fix and seeing if they can detect a bug after the fix is applied is also help. Automating the entire things and letting LLMs fix them without review is likely not help.

The bugs I can trivially find, the bugs Claude can trivially find and the bugs Codex can trivially find are not necessarily the same. The most obvious bugs would be obvious to any of us three, but beyond that we would't agree on a definition of 'trivial bug'.

This gets worse if you factor in different harnesses built for the task, and future model updates

Sure, things will get calmer than they are now, but shipping without bugs someone else considers trivial will still require more effort than most are willing to invest

But by using a LLM everyone is automatically devoting that much effort, and so while we don't fix all bugs we fix a lot that wouldn't have been.

Ok, I get a better idea of what youre saying from this reply than your original comment. It wasn't helpful to me that you suggested I reread your original comment.

I agree that LLMs make finding bugs take far less time and energy. I also agree that this should mean in the long run there are less trivial to find bugs IF everyone adopts the usage of LLMs while writing and reviewing code.

It does also seem possible that LLMs are better at finding bugs than fixing them.

This is where you're wrong, I ran an experiment and told it to find bugs in a ~200 LoC project. The models are tuned in a way to where they're expected to generate issue reports so a codebase that had zero bugs, zero vulnerabilities and zero changes needed it found 3 low severity issues (cosmetic) 1 medium severity issue and 1 critical severity issue. The critical severity issue was accepting unvalidated user input, for... an echo command.

Did you make any attempt at tuning the prompt to reduce false positives? Or did you just say "find bugs"? Because if you tell it to do that, it will.

The point I was trying to make is that there will always be people reporting "critical" bugs.

That supposes that LLMs can write secure software. Also, if we assume that finding bugs is easier that not creating them (reasonable I would say), the supply of bugs will never be exhausted.

How can it be easier to find them than to not create them? Whatever you do to find them, you could do before you release.

Because the behaviour of software changes over the time of development and that's how many bugs happen in the first place.

Especially if you use AI, let's say you have it implement a feature and then change your mind. In my experience AI makes as many if not more bugs than a human.

You can accidentally create a bug that you yourself cannot find.

I think I’m on this side. I find it exceedingly unlikely that we just start producing “perfect” software all the time for everything, and at the same time start generating an order of magnitude MORE software.

What's the difference between finding bugs and not making? Just run the bug finding in during CICD.

It’s not necessarily symmetrical, and in fact would be very surprising if it was. It’s a probabilistic algorithm on both sides, so the energy use to find any working program vs all bugs in a working program are fundamentally different search spaces. Not to mention the false positive rate and the human verification effort. Then even the idea of incremental security checks is potentially flawed since many security issues are non-local (ie not localized to a single module).

It does not suppose that LLMs can write secure software

> That supposes that LLMs can write secure software.

I think we're at the point that the best LLMs can indeed write software that's far more secure than your average programmer. Partly because the average is so terrible.

There's an assumption in here that every developer is spending a load of money on the latest and most capable LLMs to scan for bugs in their code before every release.

But the last couple of decades have shown us that huge numbers of developers aren't even following basic and free secure development practices, let alone pouring money into expensive scanning tools.

It may be only a matter of time before all devs remember to append ", and make no mistakes" to the end of their LLM prompts, but I don't think we as an industry will ever reach a point at which every release of every package/library/application is scanned with the most capable model available.

I mean, we've had tooling like fuzzers available for a very long time, and most devs haven't run one against their software ever, let alone before each release.

It's the human factor I think will keep this a problem essentially-forever.

LLMs are finding bugs where there aren't any and wasting human time trying to disprove the slop.

If all LLM reports were accurate, they'd be of any value. However, that's not what is happening. If you have even mentioned something about a bug bounty anywhere, waves of slop peddlers will flood you with fake reports marking every minor bug as a critical problem, hoping to catch a handful of dollars in the process.

These models do find some problems and may even provide decent suggestions to fix them (though they really want to add code above anything else, quickly leading to spaghetti if you accept it all). That's not the issue at the moment, and as long as people try to incentivize people to report bugs, the issue will remain.

I do expect this to be temporary, though. Not because LLMs will fix all the bugs, but because the flood of slop will shut down most public bug bounties.

thats is definitely NOT what the article says.

Are you making a counterpoint that the reports are so good and must all be addresses, but the problem is "llm finding all the bugs" so fast us poor slow humana cannot keep up?

because if so, i suggest you write a new article.

> Once most of the bugs are fixed

my brother in christ, I hope you're actually trying to be funny here.

99 little bugs in the code

99 little bugs

take one down, patch it around

105 little bugs in the code...

Lol you think LLMs are generating bug free code?

I never said that. I said they are good at helping fix them. Go read the bug reports on firefox, or Safari, or Chrome. Most of them have a fix. It might be wrong but it usually points in the right direction, which is a 1000x more than nearly all human bug reports. So, the LLM helps. which is all I stated.