Given the high rate of false positives people are reporting for the non-silent cybersecurity, biological, etc., safeguards, there is a strong likelihood that you will encounter silently nerfed behavior even if you are _not_ violating their TOS.

Ultimately this will be evident in the way customers / external benchmarkers experience Fable. Hopefully competition will drive future models toward a lower false positive rate. Until that happens, Mythos and Fable users seem likely to have pretty divergent experiences.

It's such an obviously bad policy, it's mind-boggling that they thought this was a good idea. It just breeds paranoia and mistrust, especially when people are already a bit paranoid about silent model quantification for cost cutting reasons.

Its not pranoia when entity you are dealing with cant be trusted and will do everything to abuse your trust.

What's the alternative? Not release the model at all?

"Make the guardrails better" isn't very hard and probably not worth the effort.

The alternative is to be explicit when you nerf, so users know what they are working with.

I guess people would just game the system and find ways around these guardrails.

They have enough info on you and your sessions to eventually catch you, label you as bad faith actor and ban you automatically. I don't think many would risk it.

Another "knob" is reducing the thinking time...

I'm a medical physicist. I use the word nuclear a lot. Opus is fine (well, 99% of the time - I've certainly hit the CBRN filters a few times and even been invited to email anthropic about the false positives).

Fable has literally refused to work on any of my problems (even those about fluid dynamics!) and just tells me that I'm violating anthropic's AUP.

This problem is compounded by the fact that you can be banned (really by any provider) based on an algorithm, and the methods for restoring your account seem like they do not function as well as might be desired. So be careful with your queries, basically, or you might get locked out.

If a benchmark is affected the model owner will almost certainly tune it, so there will be a game of cat and mouse...

Honestly, wouldn't surprise me if the AI companies try to detect benchmarking. Most hardware companies do...