The problem is you as a person are not incentivized to introduce bugs in your code. If I am a company that provide provides an LLM/agent, and I know that the more bugs you have the more money I’m going to make, then I am not exactly incentivized to make my LLM/Agent better at preventing bugs. I don’t even have to explicitly make it introduce them. The incentive structure is simply out of whack.

Depends on how the billing works.

For users on fixed monthly pay accounts they'll be incentivised to do the exact opposite, as their income is fixed and the cost goes up for more tokens.

If the available evidence (third-party cloud pricing of open models) is correct and they make a profit on tokens but lose it on training, they will be incentivised for as many tokens as possible on pay-as-you-go API calls. If it isn't correct and they actually lose money even per token, they're also going to be incentivised to reduce output here.

Isn't it more likely the opposite - individial devs are likely to try to fudge metrics about how many vulnerabilities they find in their own code.

Whereas with LLMs, they’re really good about providing objective metrics about the bugs they found, especially as a subsequent LLM security scan does not know whether the same LLM wrote code earlier, the opposite of human devs.

And is the idea that organizations and/or benchmarks won't keep track of vulnerability rates for code from different LLMs?

(And individual devs get paid more the more bugs that they introduced they “find”, and they have more job security with an “maintainable” code base than a “finished” one.)

That’s like saying screw manufacturers are incentivized to give you crappy screws because it means you will buy more.

No. You will switch to a competitor that does a better job or charges less or both.

This is why monopolies are such a big problem. Because under a monopoly you are right.

What you’re describing is a one-to-one quality/failure problem by choosing to ruin the basic, core functionality of an item (while also endangering people at that). Or if you start with a bad screw, that just means you’re talking about people’s tolerance for bad products. What I’m talking about is similar but a little more nuanced and has plausible deniability. The relationship I’m describing is more indirect and it doesn’t require explicit effort to cheapen a product, but rather simply not improving a specific element of the product.

Apple made a ton of money off of lightning port accessories, you see it referenced here all the time. Apple had no incentive to swap to USB-C though it would create a better product and be more uniform with the rest of the world, so they kept with it despite incredibly vocal calls to swap because there was a ton of money they were making in the accessories. And it didn’t stop until they were forced to stop by the EU.

When we are talking about products at scale, these kinds of incentive structures play out in very tangible ways. If I have an LLM product and I’m getting two pulls at the hose because you’re burning tokens making stuff and correcting it, I don’t need to do anything. People are willing to tolerate that system to a pretty high degree so long as they ultimately get what they wanted in the end - unfortunately that is a great space to make money in.

This is the reason that people felt like Apple should be treated as a monopoly, though. The switching cost is high, and the benefits you lose are large. So people put up with it, in spite of being upset.

The switching cost is not high for LLMs as far as I can tell.

For an individual, no it is not. For a massive corporation that has a huge contract and has their entire workforce working on it? That’s not such an easy switch, especially depending on the tooling involved beyond just the core LLM.