Hacker News

tekacs a day ago [ - ]

What you're describing only applies to security or biotech downgrades. A downgrade related to the model believing that you're doing something related to model development is invisible and silent and internal.

steveklabnik a day ago [ - ]

Anthropic has reversed that decision. (But that just happened so it might have been true during the article's testing.)

espeed 20 hours ago [ - ]

When I reported this, Anthropic sent me an email on Tuesday saying, "You have been approved into the Cyber Verification Program", but it's still downgrading. Is this a bug? What's the point of the Cyber Verification Program if Fable 5 downgrades when you tell it to write secure code?

steveklabnik 19 hours ago [ - ]

I don’t think that’s relevant? The change is that it will no longer silently downgrade, and will instead be honest that it’s doing it in all cases.

rattray 17 hours ago [ - ]

I think that gets you access to mythos, which doesn't have the safeguards. It's configured as a separate model.

wren6991 4 hours ago [ - ]

They've publicly apologised for the invisible PEFT that deliberately makes the model dumb on some tasks. Whether they still do it, or will once again do it in future in more subtle ways, is something we can't verify.

Personally I think they have proven themselves to be the stewards of AI in the same way Exxon Mobil are the stewards of petroleum.

tekacs a day ago [ - ]

I was just coming here to post this reply to myself! You're absolutely right! :)

Honestly so glad to see the reversal.

matheusmoreira 20 hours ago [ - ]

Not sure if it's wise to trust them again even if they say they reversed it.