My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.
Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.
This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.
Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.
If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.
I'm not sure how the new guardrails work exactly, but I've read enough of reddit / Chinese communities focused on jailbreaking the models, to know that you either have to nerf it to the point where it fires even on "kill the task", or someone (maybe even other LLM) is going to come up with a set of tokens that is going to go around the defenses.
Nerfed models are really bad for PR, especially when you're staking your company's future on it being the smartest, most dangerous thing in the world.
So I believe they will ease up on nerfing/guardrails just enough that bad actors will find a way, while good ones will stay limited on anything dual-use. Just like such restrictions usually work in other places.
P.S. yes, "kill the task" did, in fact result in a refusal AND a warning on my claude account in Opus 4.8's early days.
> If you take avoiding cyberattack uplift seriously as a goal
This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".
Ah, you're probably one to ask. They say "queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8." Are they transparent about when that happens, and is it priced at the rate of the underlying model?
They are transparent about when it happens but no reason why. To be fair, it doesn't interrupt the flow, just drops to Opus and proceeds. The most frustrating thing is that it happened on a plan and Fable just refused to have anything to do with the plan.