There have been attempts like https://arxiv.org/pdf/2410.09102 to do this kind of color-coding but none of them work in a multi-turn context since as you note you can't trust the previous turn's output
There have been attempts like https://arxiv.org/pdf/2410.09102 to do this kind of color-coding but none of them work in a multi-turn context since as you note you can't trust the previous turn's output
Yeah, the functionality+security everyone is dreaming about requires much more than "where did the the words come from." As we keep following the thread of "one more required improvement", I think it'll lead to: "Crap, we need to invent a real AI just to keep the LLM in line."
Even just the first step on the list is a doozy: The LLM has no authorial ego to separate itself from the human user, everything is just The Document. Any entities we perceive are human cognitive illusions, the same way that the "people" we "see" inside a dice-rolled mad-libs story don't really exist.
That's not even beginning to get into things like "I am not You" or "I have goals, You have goals" or "goals can conflict" or "I'm just quoting what You said, saying these words doesn't mean I believe them", etc.