> we chose them to understand intent
Yet they don't understand the intent of "Never do X" ?
Understanding intent and following instructions are different failure modes. LLMs are good at the first, unreliable at the second. That's exactly why enforcement lives outside the LLM.
Software engineering has a word for that.
Kludge.
Good luck!
Understanding intent and following instructions are different failure modes. LLMs are good at the first, unreliable at the second. That's exactly why enforcement lives outside the LLM.
Software engineering has a word for that.
Kludge.
Good luck!