Now consider what will happen if your pattern of queries and context history triggers a pattern that makes it obvious it's some API key being used by multiple different entirely unrelated people on totally different things, or any other pattern of use that makes it obvious it's being used for distillation.

Two parts here.

First, well-calibrated systems for detecting API compromise is a good thing (or good intent at least). Credential malware is exploding.

Second, the challenge is that significant amount of genuine work — such as evals — seems practically impossible to distinguish from generating RLAIF outputs.