What you're describing only applies to security or biotech downgrades. A downgrade related to the model believing that you're doing something related to model development is invisible and silent and internal.

Anthropic has reversed that decision. (But that just happened so it might have been true during the article's testing.)

When I reported this, Anthropic sent me an email on Tuesday saying, "You have been approved into the Cyber Verification Program", but it's still downgrading. Is this a bug? What's the point of the Cyber Verification Program if Fable 5 downgrades when you tell it to write secure code?

I don’t think that’s relevant? The change is that it will no longer silently downgrade, and will instead be honest that it’s doing it in all cases.

I think that gets you access to mythos, which doesn't have the safeguards. It's configured as a separate model.

They've publicly apologised for the invisible PEFT that deliberately makes the model dumb on some tasks. Whether they still do it, or will once again do it in future in more subtle ways, is something we can't verify.

Personally I think they have proven themselves to be the stewards of AI in the same way Exxon Mobil are the stewards of petroleum.

I was just coming here to post this reply to myself! You're absolutely right! :)

Honestly so glad to see the reversal.

Not sure if it's wise to trust them again even if they say they reversed it.