Hacker News

When prompting an LLM service to leak the system prompt, how do you have the faintest idea as to its accuracy?

I‘ve read people say it‘s a difficult challenge for the providers. But aren‘t there some pretty basic strategies? E.g., code pretty near the front of the stack that just does some fuzzy string comparison on all output? They don‘t need to rely on just model behavior…

I imagine it‘s likely that the model is just doing what it‘s good at? Hallucinating a prompt?