Hacker News

I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.

Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

sciencejerk 3 hours ago [ - ]

Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

dnautics 3 hours ago [ - ]

public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.

zozbot234 an hour ago [ - ]

Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.

cherryteastain 11 minutes ago [ - ]

This logic works only if distilling Claude is the only way to create another SOTA LLM, which is not the case.

ryandrake 3 hours ago [ - ]

I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.

margalabargala 3 hours ago [ - ]

No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

criddell 3 hours ago [ - ]

That would be Anthropic.

CamperBob2 3 hours ago [ - ]

Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

solenoid0937 3 hours ago [ - ]

Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.

CamperBob2 2 hours ago [ - ]

No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action.

They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.

solenoid0937 2 hours ago [ - ]

This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."

CamperBob2 2 hours ago [ - ]

It's a pretty reasonable statement if you work for Anthropic and are eyeing your stock options nervously and your competitors even more so.

solenoid0937 an hour ago [ - ]

Everyone that isn't a bitter cynic must be a shill.

wouldbecouldbe 3 hours ago [ - ]

I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails

pwython 2 hours ago [ - ]

I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.

Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."

Ok fine, I said go for it, and it says:

"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."

Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

notrealyme123 3 hours ago [ - ]

exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.