Hacker News

Modern LLMs do a great job of following instructions, especially when it comes to conflict between instructions from the prompter and attempts to hijack it in retrieval. Claude's models will even call out prompt injection attempts.

Right up until it bumps into the context window and compacts. Then it's up to how well the interface manages carrying important context through compaction.