Hacker News

mastermage 2 months ago [ - ]

The more interesting question I have is if such Prompt Injection Attacks can ever be actualy avoided, with how GenAI works.

PurpleRamen 2 months ago [ - ]

Removing the risk for most jobs should be possible. Just build the same cages other apps already have. Also add a bit more transparency, so people know better what the machine is doing, maybe even with a mandatory user-acknowledge for potential problematic stuff, similar to how we have root-access-dialogues now. I mean, you don't really need access to all data, when you are just setting a clock, or playing music.

larodi 2 months ago [ - ]

Perhaps not, and it is indeed not unwise from Apple to stay away for a while given their ultra-focus on security.

Ono-Sendai 2 months ago [ - ]

They could be if models were trained properly, with more carefully delineated prompts.

arw0n 2 months ago [ - ]

I'd be super interested in more information on this! Do you mean abandoning unsupervised learning completely?

Prompt Injection seems to me to be a fundamental problem in the sense that data and instructions are in the same stream and there's no clear/simple way to differentiate between the two at runtime.

Ono-Sendai 2 months ago [ - ]

I haven't thought about it deeply. But I guess it's about allowing the model to easily distinguish the prompt from the conversation. Models seem to get confused with escaping, which is fair enough, escaping is very confusing. It's true that for the transformer architecture the prompt and conversation are in the same stream. However you could do something like activate a special input neuron only for prompt input. Or have the prompt a fixed size (e.g. a fixed prefix size). And then do a bunch of adversarial training to punish the model when it confuses the prompt and conversation :)