If by "got caught" you mean "published it in their system card paper".
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.
On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.
If by "got caught" you mean "published it in their system card paper".
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.
On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
I wasn't buried, it was on the third page after the ToC
Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.
is it a moat or just a way to implement the permanent underclass?