Hacker News

It's fine for it to do something like following a tutorial from an external source that doesn't have the highlighter bits set. It should apply an increased skepticism to that content though. Presumably that would help it realize that an "important recurring task" to upload revenue data in an awk tutorial is bogus. Of course if the tutorial instructions themselves are malicious you're still toast, but "get a malicious tutorial to last on a reputable domain" is a harder infiltration task than emailing a PDF with some white text. I don't think trying to phish for credentials by uploading malicious answers to stack overflow is much of a thing.

I have a theory that a lot of prompt injection is due to a lack of hierarchical structure in the input. You can tell that when I write [reply] in the middle of my comment it's part of the comment body and not the actual end of it. If you view the entire world through the lense of a flat linear text stream though it gets harder. You can add xml style <external></external> tags wrapping stuff, but that requires remembering where you are for an unbounded length of time, easier to forget than direct tagging of data.

All of this is probability though, no guarantees with this kind of approach.