Yeah, with a budget assigned. This is actually just software development and security right?

Developers create software, which has bugs. Users (including bad guys, pen testers, QA folks, automated scans etc, etc, etc) find bugs, including security bugs, Developers fix bugs and maybe make more. It's an OODA loop, and continues until the developers decide to stop supporting the software.

Whether that fits into the business model, or the value proposition of spending tokens instead of engineer hours or user hours is fundamentally a risk management decision and whether or not the developer (whether OSS contributor, employee, business owner, etc) wants to invest their resources into maintaining the project.

While not evenly distributed, and not perfect, the currently available and behind embargoed tools are absolutely impactful, and yes, they are expensive to operate right now - it may not always be the case, but the "Attacks always get better" adage applies here. The models will get cheaper to run, and if you don't want to pay for engineers or reward volunteers to do the work, then you've got to pay for tokens, or spend some other resource to get the work done.

Somehow this reminded me of the historical efforts of some government bounty collections for mouse tails which were discontinued due to fraud (such as hunters breeding mice to collect the reward). There is a reason why/how devs and QA keep each other in check. Guess in case of LLM writing code, one has to use different models for dev and security checks.

On other hand, in real world, the developers learn from mistakes and avoid them in the future. However there is no feedback loop with enterprises using LLM with the agreement that the LLM would not use the enterprise code for training purposes

> the developers learn from mistakes and avoid them in the future

No. Humans learn from mistakes and try to avoid them in the future, but there is a whole pile of other stuff in the bag of neurons between our ears that prevent us from avoiding repetition of errors.

I have seen extremely talented engineers write trivial to avoid memory corruption bugs because they were thinking about the problem they were trying to solve, and not the pitfalls they could fall into. I would argue that the vast majority of software defects in released code are written by people that know better, but the bug introduced was orthogonal to the problem they were trying to solve, or was for an edge case that was not considered in the requirements.

Unless you are writing a software component specifically to be resilient against memory corruption, preventing memory corruption issues aren't top of mind when writing code, and that is ok since humans, like the machines we build, have a limit to the amount of context/content/problem space that we can hold and evaluate at once.

Separately, you don't necessarily need to use different models to generate code vs conduct security checks, but you should be using different prompts, steering, specs, skills and agents for the two tasks because of how the model and agents interpret the instructions given.

> write trivial to avoid memory corruption bugs because they were thinking about [something else] [...] defects [...] written by people that know better, but the bug introduced was orthogonal to [their focus]

For whatever reason, hadn't associated the inattentional blindness of bug writing with the invisible gorilla experiment and car crashes - selective attention fails. People looking right at the gorilla strolling into production while chest thumping, but not seeing it, for a focus on passing basketballs. That's quite an image. Tnx.

I've noticed even people who do offensive security for a living frequently leave gaping holes in their own code. If you're not actively primed to scan the landscape for the gorilla, you will often miss it even if you're a gorilla inquisitor.

Thank you in turn for making the issue much more salient to me by explicitly connecting it to the gorilla/basketball experiment. This is definitely going into my "clippings".

I think a similar thing comes into play when you ask a developer to write tests for the feature they just implemented. They’re going to have selective blindness for the edge cases (or requirements) that they failed to consider during implementation, unless they’re good at context switching into a testing mindset. And that’s something that benefits from training.

The problem is you as a person are not incentivized to introduce bugs in your code. If I am a company that provide provides an LLM/agent, and I know that the more bugs you have the more money I’m going to make, then I am not exactly incentivized to make my LLM/Agent better at preventing bugs. I don’t even have to explicitly make it introduce them. The incentive structure is simply out of whack.

Depends on how the billing works.

For users on fixed monthly pay accounts they'll be incentivised to do the exact opposite, as their income is fixed and the cost goes up for more tokens.

If the available evidence (third-party cloud pricing of open models) is correct and they make a profit on tokens but lose it on training, they will be incentivised for as many tokens as possible on pay-as-you-go API calls. If it isn't correct and they actually lose money even per token, they're also going to be incentivised to reduce output here.

Isn't it more likely the opposite - individial devs are likely to try to fudge metrics about how many vulnerabilities they find in their own code.

Whereas with LLMs, they’re really good about providing objective metrics about the bugs they found, especially as a subsequent LLM security scan does not know whether the same LLM wrote code earlier, the opposite of human devs.

And is the idea that organizations and/or benchmarks won't keep track of vulnerability rates for code from different LLMs?

(And individual devs get paid more the more bugs that they introduced they “find”, and they have more job security with an “maintainable” code base than a “finished” one.)

That’s like saying screw manufacturers are incentivized to give you crappy screws because it means you will buy more.

No. You will switch to a competitor that does a better job or charges less or both.

This is why monopolies are such a big problem. Because under a monopoly you are right.

What you’re describing is a one-to-one quality/failure problem by choosing to ruin the basic, core functionality of an item (while also endangering people at that). Or if you start with a bad screw, that just means you’re talking about people’s tolerance for bad products. What I’m talking about is similar but a little more nuanced and has plausible deniability. The relationship I’m describing is more indirect and it doesn’t require explicit effort to cheapen a product, but rather simply not improving a specific element of the product.

Apple made a ton of money off of lightning port accessories, you see it referenced here all the time. Apple had no incentive to swap to USB-C though it would create a better product and be more uniform with the rest of the world, so they kept with it despite incredibly vocal calls to swap because there was a ton of money they were making in the accessories. And it didn’t stop until they were forced to stop by the EU.

When we are talking about products at scale, these kinds of incentive structures play out in very tangible ways. If I have an LLM product and I’m getting two pulls at the hose because you’re burning tokens making stuff and correcting it, I don’t need to do anything. People are willing to tolerate that system to a pretty high degree so long as they ultimately get what they wanted in the end - unfortunately that is a great space to make money in.

This is the reason that people felt like Apple should be treated as a monopoly, though. The switching cost is high, and the benefits you lose are large. So people put up with it, in spite of being upset.

The switching cost is not high for LLMs as far as I can tell.

For an individual, no it is not. For a massive corporation that has a huge contract and has their entire workforce working on it? That’s not such an easy switch, especially depending on the tooling involved beyond just the core LLM.

Are you thinking of the cobra effect (aka https://en.wikipedia.org/wiki/Perverse_incentive) where people in India started breeding cobras to get the reward?

Plenty of examples abound:

https://en.wikipedia.org/wiki/Great_Hanoi_Rat_Massacre

> Today, the events are often used as an example of a perverse incentive, commonly referred to as the cobra effect. The modern discoverer of this event, American historian Michael G. Vann argues that the cobra example from the British Raj cannot be proven, but that the rats in the Vietnam case can be proven, so the term should be changed to the Rat Effect.

You don't need different models, just different contexts (optimally with different personas).

Great analogy. The problem is the incentive structure. Anthropic would nothing nothing more than for all of us to write big sprawling slop codebases so we can spend endless tokens reading, rereading, fixing, refixing forever.

You apparently have not much experience developing software.

It's pretty absurd to do it on AI-generated code though. If there is now an automated way to find vulnerabilities, coding models can be pretty easily trained to not introduce them

Tell me you don’t know how AI works without telling me you don’t know how AI works.

What are you talking about?

I’ll try to steelman this comment. Anyone who uses coding tools knows that the output is heavily affected by details of the task you give it. The same model can give you garbage code or genius code for the same problem with slightly different framing. So it’s not necessarily a limitation in the model’s training that causes it to output security bugs. The model might be great at writing secure code, but you need a different harness to elicit that behavior.

Counterargument: just because the problem can be fixed without training, doesn’t mean training isn’t a possible solution.

Counter-counter-argument: for LLMs, tokens are units of thinking. And token use is, on the margin, directly proportional to costs of inference. So while the details of the harness, and how you prompt the model, and nature of the code and docs you put in context, etc. all matter to the quality of output you get from LLM coding tool, ultimately, there's always a ceiling to how much you're willing to spend on solving a problem - say, no more than 30 minutes, or $10, on refactoring a target module or implementing a small feature - and that puts a limit on how much thinking the model can put into it.

Thing is, writing secure and efficient and readable and simple code is in many cases fundamentally over that limit. It's possible, but you can't afford (or rationally just don't want) to spend as much on it as it's required for superhuman quality on all these aspects. Also most of the time, you don't want to operate at a limit - you probably expected that feature to take 30 seconds and less than $1 to implement. So you choose, both what the model optimizes for, and how much.

Because of that, no matter how good the model and the harness and the prompting are, $10 spent on coding is still bound to leave behind some security vulnerabilities that subsequent $10 spent on security review will find (especially with a model post-trained for that, at expense of general performance).

I guess I thought this should be obvious to everyone but, looking at code and finding exploits is completely different from .. writing exploits.

For one thing exploits often require completely different parts of the code to chain together. Sometimes parts of code the LLM itself isn’t writing.

And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.

It's just an incremental thing. You're both right. They will slowly become less and less likely to introduce vulns due to higher intelligence and better RL. Offensive capabilities will still probably scale faster than automatic defensive-while-coding ones.

>I guess I thought this should be obvious

People in this thread are talking past and misunderstanding each other and making unrelated points.

The point of the response to the top level comment was questioning the conflict of interest in model providers creating separate revenue streams for themselves by selling a product that fixes problems their other product created, akin to OS providers selling anti-virus software back in the day.

Similarly, it should be obvious to you that a software engineer can trivially get into the mindset of writing more expoitable code by pretending the production code they're tasked with writing is hobby code or prototype code.

If profitable revenue streams with adverserial products are in place, no one should be surprised when model providers are disincentivised to improve the "garbage code quality, but hey it works!" nature of their most used code generators.

>And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.

...it should also be obvious people in this forum have wildly different experiences with respect to the code quality the LLMs they use generate. I personally find it difficult to find anyone that argues that the LLMs they are using are consistently generating high-quality code across a vast codebase.

In every prompt: "write me code without exploitable bugs".

I know it doesn't work so easily as someone who uses AI for coding, but I do find repetition of basics in almost every prompt keeps the AI focused.

Usually the same guy doesn't get paid for developing code, bug bounty and fixing the code.

It leads to corruption. To paraphrase Dilbert "I'm going to code myself a car."