Hacker News

This is very similar in root cause and exploitation to Copy Fail.

Which illustrates pretty well something that's lost when relying heavily on LLMs to do work for you: exploration.

I find that doing vulnerability research using AI really hinders my creativity. When your workflow consists of asking questions and getting answers immediately, you don't get to see what's nearby. It's like a genie - you get exactly what you asked for and nothing more.

The researcher who discovered Copy Fail relied heavily on AI after noticing something fishy. If he had to manually wade through lots of code by himself, he would have many more chances to spot these twin bugs.

At the same time, I'm pretty sure that by using slightly less directed prompting, a frontier LLM would found these bugs for him too.

It's a very unusual case of negative synergy, where working together hurt performance.

timcobb 10 hours ago [ - ]

> When your workflow consists of asking questions and getting answers immediately, you don't get to see what's nearby.

Very much aligns with my experience. For me this is the most unsatisfying thing about AI-based workflows in general, they miss stuff humans would never miss.

All the time I wonder what am I missing that's right nearby? It's remarkable how many times I have to ask Claude code to fully ingest something before it actually puts it into context. It always tries to laser through to target it's looking for, which is often not what you want it to look for, at least not all you want it to look for. Getting these models to open up their field of vision is tough.

VladVladikoff an hour ago [ - ]

Actually lately I’ve been feeling the other way around with it. The LLM catches things I would have overlooked. I ask for a new feature in a certain file, and the LLM suggests fixing a tangentially related file to accommodate the new feature without breaking something else. Maybe this is just the crap legacy codebase I’m working with and how tangled up everything is, but I definitely have found several times now that it caught things I would have missed.

timcobb an hour ago [ - ]

> The LLM catches things I would have overlooked. I ask for a new feature in a certain file, and the LLM suggests fixing a tangentially related file to accommodate the new feature without breaking something e

What are you using? Do you think this behavior is in response to prompting? My goal at times is to "rabbit hole" the LLM to get it to go down rabbit holes and find bigger and bigger picture issues until it homes in on something fundamentally broken that could have big impact if fixed. But it's not trivial to push the agent in that direction for me.

ulrikrasmussen 9 hours ago [ - ]

Do you think this is inherent or an artifact of prompting? Curiosity and side quests leads to higher token usage and longer time to finish, so I could understand why current harnesses and system prompts would not encourage that sort of thing.

But what if a coding agent was prompted to be more curious during development? Like a human developer, make mental notes of alternatives to try out and chase suspicious looking code which may seem unrelated to the task at hand. It could even spawn rabbit hole agents in parallel.

Taking a step back, this probably highlights major hazard with the increased usage of LLMs for coding, which is that everyone's style of work is going to converge because most code will be written by the 2-3 most popular models using the same system prompts.

timcobb an hour ago [ - ]

> Do you think this is inherent or an artifact of prompting?

Not sure! I mean, look at this sibling comment for example: https://news.ycombinator.com/item?id=48062797. Not my experience, but apparently others have this experience.

> But what if a coding agent was prompted to be more curious during development?

I've tried using the language of curiosity. My qualitative take was that it did have a positive impact, but not much. And I can only tinker with system prompting so much, before I get drawn into LLM driving :)

> which is that everyone's style of work is going to converge

yeah I imagine even people's styles of thinking will converge as a result of this, more so than from reading other people's prose or programs. I think I saw something on HN to this effect within the last month, too.

lloeki 7 hours ago [ - ]

I've seen something similar, solutions generated feel very pythonic or javaesque in languages that are neither Python nor Java (C, Rust, Ruby)

I've had to explicitly direct the machine to read existing sibling code and follow the specific idioms and patterns in use.

clbrmbr 4 hours ago [ - ]

It’s interesting to compare how the agentic search performs, with these targeted reads and lots of tool calls in the stream, versus the older but still valid paradigm of using a high-reasoning model like GPT-X-pro and feeding in all the relevant files at once with no tools.

I have found that the “pro” approach is much more holistic and able to tackle rather “creative” problems that require very careful design and the overall artifact is tight and self-consistent. — Claude Code by comparison is incredible in exploration and targeted implementation but indeed is not great at seeing the forest.

dotancohen 6 hours ago [ - ]

  > All the time I wonder what am I missing that's right nearby?

Add to the prompt "use coding conventions of the file which you are currently editing". That gets the machine (Opus and Sonnet at least) to go over the nearby code and occasionally mention something obvious.

9 hours ago [ - ]

[deleted]

eqvinox 18 hours ago [ - ]

No, unless I'm misreading it it's the *same* root cause: high 32 bits of Extended ESN in IPsec == authencesn module/cipher mode.

The wrong thing got fixed for copy.fail, because people jumped to blame AF_ALG.

[ed.: yes it's the same authencesn issue. https://github.com/V4bel/dirtyfrag/blob/892d9a31d391b7f0fccb... it doesn't say authencesn in the code, only in a comment, but nonetheless, same issue.]

[ed.2: the RxRPC issue is separate, this is about the ESP one]

firer 18 hours ago [ - ]

There are two vulnerabilities here.

The RxRPC one is definitely a different root cause (although caused by a very similar mistake).

For the ESP one it's a bit harder to tell. I don't think the wrong thing was fixed, just that there was a very similar bug in almost the same spot. Could be wrong about that though.

eqvinox 18 hours ago [ - ]

(you probably wrote this while I was editing my post.)

It's absolutely the same issue in authencesn/ESP. There's another one in RxRPC that is AIUI completely unrelated.

papascrubs 18 hours ago [ - ]

Or a follow up prompt: "find similar classes of bugs". Once the actual case has been layed out finding like bugs isn't too hard. I hear you on the creativity bit. Like any tool, AI can put blinders on. Using it to augment without it fully taking over your workflow is tough.

dgellow 8 hours ago [ - ]

Not just like any tool though. Interacting with agents can be incredibly boring and frustrating in a way that I personally do not experience with other technology

tptacek 18 hours ago [ - ]

I don't follow. LLMs spotted these bugs in the first place. You seem to be saying that these discoveries are indications that they're bad for vulnerability discovery.

firer 18 hours ago [ - ]

From what I understand, the copy fail bug was found by researcher who noticed something weird and then using AI to scan the codebase for instances where that becomes a problem.

I bet that with a slightly looser prompt/harness, the LLM could have found these twin bugs too.

Yet at the same time, I also think that if the human researcher had manually scanned the code, he'd have noticed these bugs too.

FWIW I do think LLMs are great tools for finding vulnerabilities in general. Just that they were visibly not optimally applied in this case.

aerodexis 12 hours ago [ - ]

They could also have found all these things at the same time - and are slow-rolling the disclosures.

eqvinox 18 hours ago [ - ]

I don't think the copy.fail people understood the issue they found, as is evident by the heavy focus on AF_ALG/aead_algif, which is essentially "innocent" as we're seeing here.

I think LLMs are great for vulnerability discovery, but you need to not skimp on the legwork and understanding what even you just found there.

tptacek 18 hours ago [ - ]

Right but without the LLM the bug doesn't get found at all.

_AzMoo 15 hours ago [ - ]

That's not necessarily true. Who's to say the security researchers wouldn't have found it if they'd searched the code manually?

tptacek 15 hours ago [ - ]

It's an AI security firm! You might just as productively ask "why did all the other engineers who ever looked at this code not find it, and why was Theori the one to actually surface it?".

cp9 13 hours ago [ - ]

I’m hardly going to simp for LLM tools but the fact that the bug existed and no one had reported it seems proof positive no one was about to find it without them

UltraSane 15 hours ago [ - ]

It would have taken a LOT longer but often this kind of manual search is so tedious people just don't do it. LLMs don't get bored.

dgellow 8 hours ago [ - ]

> LLMs don't get bored

They do not get bored like a human but they are trained on human language and replicate the same traits, such as laziness, and expressing boredom or annoyance (even if obviously they do not experience anything at all). It’s actually a lot of effort to get them to engage with things at a deeper level without skipping corners

baq 8 hours ago [ - ]

Safer to assume at least one of NSA, Mosad and a few others were sitting on it for years.

eqvinox 18 hours ago [ - ]

Yes, I agree. I'm not the GP poster.

parliament32 18 hours ago [ - ]

No, they did not. Careful of falling for the psychosis.

> This finding was AI-assisted, but began with an insight from Theori researcher Taeyang Lee, who was studying how the Linux crypto subsystem interacts with page-cache-backed data.

https://xint.io/blog/copy-fail-linux-distributions

tptacek 17 hours ago [ - ]

Theori is an AI security research firm.

duk3luk3 15 hours ago [ - ]

You appear to want to die on the hill of "This vulnerability would never have been found if we lived in a world without LLM AI" which is a very strange hill to die on.

There's no question that we live in the world where LLM AI was involved in finding the copy fail vulnerability at this specific time, and it's completely normal for people to see a vulnerability and then look closer and find related vulnerabilities or a deeper root cause, but there's no need to adopt an extreme "without AI LLM we don't find these vulnerabilities" position.

tptacek 14 hours ago [ - ]

It's weird to say I want to "die on this hill" because that's not even something I believe. There was nothing especially difficult about this particular vulnerability. My only observation that nobody did find it before, then an LLM security firm went out looking for Linux LPEs, and thus it was discovered.

That is a very difficult fact pattern to which to attach the conclusion "LLMs have sabotaged security research" (my paraphrase).

j16sdiz 2 hours ago [ - ]

Well.. every new vulnerability is one nobody did find it before.

Otherwise, it won't be classified as "new"

Edit:

I think LLM is very useful here.

When a researcher spot something funny, instead of spending two days on reading and testing, he can fire up a LLM and have it read all the code lead to there in ~30 minutes.

Yokohiii 14 hours ago [ - ]

The finding started with human intuition and was assisted by an LLM. You can yell "AI sec firm" 1000 times. A human got it started. You shouldn't die on that hill.

danudey 17 hours ago [ - ]

It seems as though this issue occurred to him, then he used their tool ("Xint Code") to analyze the codebase for instances of it.

ofjcihen 14 hours ago [ - ]

I don’t think that’s what the OP is saying at all, just that using LLMs needs to be a cooperative research process.

Also I see you jumping around a lot to the defense of LLMs when I don’t think anyone is really attacking them. Maybe cool it a bit.

tptacek 14 hours ago [ - ]

From the thread that ensued I feel comfortable that my interpretation of the comment (or rather, my confusion about it) was in fact germane.

ofjcihen 14 hours ago [ - ]

Germane or not the knee-jerk reactions related to LLMs are getting ridiculous and it seems like it’s the same people throwing down at a moments notice and then chalking it up to a misunderstanding.

So like I said, just chill out.

rayiner 14 hours ago [ - ]

It’s incredible humans spot stuff like this. I guess even more incredible that LLMs can do it!

keybored 7 hours ago [ - ]

Right. Finding the bug is in itself a win. It seems we’re jumping from that spend-electricity-to-find-bugs win to arguing about how some things around it are not quite good or comfy.

riedel 9 hours ago [ - ]

Just on a side note. Negative synergy does not seem so uncommon with machine learning. We did some research maybe 10 yrs ago an human/ML based duplicate detection (for a municipal support ticket system) . Research showed that pure AI and pur human outperformed co-working. Human oversight often e.g. overcorrected machine work. I think it is a nice HCI problem to solve actually to amplify creativity and unique skills in such processes. Particularly if they can be to some degree repetitive and tiresome.

harshreality 3 hours ago [ - ]

I don't know... after they found a high profile bug like copyfail, I wouldn't attribute not looking for similar bugs to them being overly dependent on AI. It's easy to stop exploring, for a while at least, after you've struck on a major find. Maybe they would've returned to it in a few months. It certainly inspired others to explore similar areas and find these new bugs. Isn't that enough?

refulgentis 17 hours ago [ - ]

It’s very hard to see a root vuln similar to, but not the same as, another discovered by AI, as a lesson about AI not exploring.

Is there a counterfactual where you would say it explored well enough, besides both vulnerabilities published as one?

SubiculumCode 14 hours ago [ - ]

Evidence or are you just riffing?

formerly_proven 18 hours ago [ - ]

These are all page cache poisoning attacks (dirtyfrag, copyfail, dirtypipe). Maybe the page cache should have defense-in-depth measures for SUID binaries?

firer 18 hours ago [ - ]

SUID mitigations have nothing to do with the vulnerability itself - just the exploit.

If there's a root cronjob that runs a world readable binary, you could modify it in the page cache and exploit it that way.

Modifying the page cache is a really strong primitive with countless ways to exploit it.

eqvinox 18 hours ago [ - ]

splice() should maybe generally refuse to operate on things you can't write to.

toast0 17 hours ago [ - ]

splice is documented to return EBADF if "One or both file descriptors are not valid, or do not have proper read-write mode."

So it seems surprising to me that you can call it when the out fd is not writable? But I didn't retain the information about the vulnerability, so I'm missing something. There was something about copy on write, IIRC?

eqvinox 17 hours ago [ - ]

"proper read-write mode" for the input fd is reading only. The exploit is writing to the splice() input fd.

Also, NB, I said permission check, not mode check. The input fd to splice can and will be open for only reading quite often. Doesn't mean the kernel can't still do a write permission check.

(Except I didn't say that here. Oops. Getting confused with my posts.)

toast0 17 hours ago [ - ]

OK, I may likely have too much sleep debt to understand, but given the bug is that splice can write to the input fd, you're suggesting maybe splice should only let you use an input fd if the process has access to write to it?

But splice is a more or less a generalization of sendfile, and sendfile is often used for webserving where the serving process does not have ownership of the documents it is serving. It doesn't make sense to limit splice such that it can't do the task it was built for. Maybe splice should just not write to the input fd? :P

cyphar 11 hours ago [ - ]

> But splice is a more or less a generalization of sendfile

Not really, splice(2) is actually more limited, it's an optimisation for reading and writing data between files and pipes without needing to make copies.

sendfile(2) works with any fds because it just exists to remove a fair bit of the copy overhead when doing a userspace read/write loop, but it does actually do a copy.

eqvinox 16 hours ago [ - ]

Yes, it'd curtail splice() usage quite heavily. Maybe too much.

But apparently we can't be trusted with the page cache…

Maybe the kernel using supervisor-read-only flags could be made to work, only issue then is what happens if something does in fact need to write…

semiquaver 16 hours ago [ - ]

Aren’t you just saying “don’t write bugs?”

formerly_proven 18 hours ago [ - ]

True! Building protections (e.g. physical pages in the page cache are not writeable 100% of the time) just for executables has of course countless circumventions as well (e.g. config files). Yeah, there is probably not that much to be done there, actually. Looking at some of the diffs it seems to me like the kernel makes it really not particularly obvious when/how this goes wrong. E.g. the patch for this is to look at an additional flag on the socket buffer to fix an arbitrary page cache write. This feels rather action at a distance. Logically this of course makes sense, the whole point of splice et al is to feed data from one file-like into another file-like, whatever those ends might be. That erases the underlying provenance of the data.

14 hours ago [ - ]

[deleted]

varispeed 17 hours ago [ - ]

> When your workflow consists of asking questions and getting answers immediately, you don't get to see what's nearby.

That's why is very very important to just step out and use saved time to go for a walk, to a park, sit on a bench, listen do birds, close eyes and zoom out.

The state we are in is actually brilliant.