Hacker News

sigmarule 21 hours ago [ - ]

> A third-party demonstrated that it was possible to jailbreak the safety measures of Fable to access the raw Mythos abilities. Abilities which Anthropic say are too dangerous for the public.

Pressure test this assumption before getting behind this position.

irthomasthomas 21 hours ago [ - ]

I will certainly revisit it as more information comes out, but is it your contention that Anthropic solved jailbreaking with Mythos?

apstls 19 hours ago [ - ]

What you claim contradicts Anthropic’s statements. I assume that is the contention.

sigmarule 21 hours ago [ - ]

That is a strawman. My contention is what you just implicitly acknowledged - there is not information put out yet to validate the quoted claim. There are claims to the contrary, as well, from Anthropic themselves.

drawnwren 20 hours ago [ - ]

In the absence of information, maybe it’s better to ask which claim is more extraordinary.

That,

A. Anthropic solved the llm jailbreak problem with mythos (despite no claim to have done so on their part)

B. That a full jailbreak of mythos is possible.

vlovich123 20 hours ago [ - ]

That’s not what the claim is though.

Anthropic’s claims are as follows if you read their post:

* this is not a universal jailbreak method

* the jailbreak affords you the same capabilities you get already with other models, not Mythos.

In this situation it’s which party do you trust more and history would suggest this administration is very playful with the truth, especially when it comes to economically damaging the company that’s become their political enemy

sigmarule 19 hours ago [ - ]

There is not an absence of information.

There is information, from Anthropic, concerning the jailbreaks that motivated this action, that directly contradicts the statement.

There is just an absence of information backing the statement I responded to.

I find it so odd this is apparently so contentious a take.

drawnwren 15 hours ago [ - ]

The existence of a jailbreak free llm in 2026 is extremely contentious to me. You can argue about the specifics of this exact jailbreak, but generally pliny and amazon both reported mythos jailbreaks in <7 days. It seems very reasonable to expect that a well funded state actor could achieve better results given significantly more funding, determination and most importantly unfettered access.

s1artibartfast 15 hours ago [ - ]

Nobody here is claiming fable is jailbreak free. Not anthropic and not in this thread. This was known before launch. The question remains one of degree and capabilities.

drawnwren 14 hours ago [ - ]

Yeah, if you're arguing that "this, according to anthropic, existentially dangerous model has only had its safeguards partially circumvented so we shouldn't step in" ... it's hard for me to take you seriously?

Put another way, the thing we are all concerned with is the complete circumvention of safeguards that is normally possible with llms. If you _aren't_ arguing that this isn't possible, you're not engaging in discussing the the thing that is concerning to regulators or those discussing the regulation.

s1artibartfast 5 hours ago [ - ]

Im pointing out what is the argument. You were saying it is something different.

Now you add the word "complete". Anthropic IS arguing _complete_ circumventing is NOT possible.

linkregister 14 hours ago [ - ]

A disappointing trend is to frame the opposing argument in extreme terms rather than engaging with the substance of the assertion.

The latter portion is grand standing about how incredulous the commenter is that someone might trust an LLM company about the strength of their harnesses' if-then-else statements for request routing.

Why bother with an unsubstantial comment?

what 21 hours ago [ - ]

What assumption?

sigmarule 21 hours ago [ - ]

The one I quoted, which contradicts Anthropic’s post and has no supporting evidence publicly available. That a jailbreak was found that accesses the model’s _raw_ capabilities. Something Anthropic has explained was not the case.

apstls 21 hours ago [ - ]

It is pretty clear, no? Anthropic claims that the jailbreaks they were made aware of did not access the model’s raw capability, explained that there are protections to mitigate the impact of successful jailbreaks, etc. Coming here and stating something to the contrary with zero explanation or actual evidence is the assumption.