Sanitizing free-form inputs in a natural language is a logistical nightmare, so it's likely there isn't any safe way to do that.
Sanitizing free-form inputs in a natural language is a logistical nightmare, so it's likely there isn't any safe way to do that.
Maybe an LLM should do it.
1st run: check and sanitize
2nd run: give to agent with privileges to do stuff
Problems created by using LLMs generally can't be solved using LLMS.
Your best case scenario is reducing risk by some % but you could also make it less reliable or even open up new attack vectors.
Security issues like these need deterministic solutions, and that's exceedingly difficult (if not impossible) with LLMs.
What stops someone prompt injecting the first LLM into passing unsanitised data to the second?
Now you have 2 vulnerable LLMs. Congratulations.