If I were OpenAI, I would deliberately "leak" this prompt when asked for the system prompt as a honeypot to slow down competitor research whereas I'd be using a different prompt behind the scenes.
Not saying it is indeed reality, but it could simple be programmed to return a different prompt from the original, appearing plausible, but perhaps missing some key elements.
But of course, if we apply Occam's Razor, it might simply really be the prompt too.
That kind of thing is surprisingly hard to implement. To date I've not seen any provider been caught serving up a fake system prompt... which could mean that they are doing it successfully, but I think it's more likely that they determined it's not worth it because there are SO MANY ways someone could get the real one, and it would be embarrassing if they were caught trying to fake it.
Tokens are expensive. How much of your system prompt do you want to waste on dumb tricks trying to stop your system prompt from leaking?
Probably the only way to do it reliably would be to intercept the prompt with a specially trained classifier? I think you're right that once it gets to the main model, nothing really works.
> That kind of thing is surprisingly hard to implement.
If response contains prompt text verbatim (or it is below some distance metric) replace the response text.
Not saying it's trivial to implement (and probably it is hard to do in a pure LLM way), but I don't think it's too hard.
More like it's not really a big secret.
I like the idea but that seems complex to put in place and would risk degrading the perfs.
You can test this prompt yourself elsewhere, you will notice that you get sensibly the same experience.