The Lethal Trifecta strikes again! Mermaid seems like a bit of a side issue, presumably there are lots of ways data might leak out. It could have just been a normal link. They should probably look further into the underlying issue: unrelated instruction following.
Thanks for the archive link and the very useful term BTW! I also got 503 when trying to visit.
I think they're doing this the right way. You can't fix unrelated instruction following with current generation LLMs, so given that the only leg you can remove from the trifecta is mechanisms for exfiltrating the data.
The first AI lab to solve unrelated instruction following is going to have SUCH a huge impact.
Not even humans can do it perfectly (hence social engineering)
Humans would not do what was reported here!
No, they are doing this the only possible way that doesn't massively restrict it being useful at all. That doesn't make it the right way.
A fundamental vulnerability to prompt injection means pretty much any output can be dangerous, and they have to expose it to largely untrusted input to be useful at all.
Even limiting output to ASCII text only is probably not entirely safe.
The right way at this point would be to not use AI.
Summarisation models that do not follow instructions already exist! Fixing exfiltration is good, low hanging fruit. But for a summarisation task, whole classes of typical instruction following behaviour are totally off target!
Which models do you recommend for that these days?
Has the field moved on much since the pre-LLM-era models like T5?