Hacker News

>We're exploring taking the action plan that a reasoning model (which sees both trusted and untrusted text) comes up with and passing it to a second model, which doesn't see the untrusted text and which then evaluates it.

How is this different from the Dual-LLM pattern that’s described in the link that was posted? It immediately describes how that setup is still susceptible to prompt injection.

>With the Dual LLM pattern the P-LLM delegates the task of finding Bob’s email address to the Q-LLM—but the Q-LLM is still exposed to potentially malicious instructions.