Hacker News

tony_cannistra a day ago [ - ]

> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

game_the0ry a day ago [ - ]

There is some unintentional good marketing here -- the model is so good its dangerous.

Reminds me of the book 48 Laws of Power -- so good its banned from prisons.

gpm a day ago [ - ]

Unintentional? This sort of marketing has been both Antrhopic's and OpenAI's MO for years...

mbil 21 hours ago [ - ]

Agree. I think they're intentionally sitting on the fence between "These models are the most useful" and "These models are the most dangerous".

They want the public and, in turn, regulators to fear the potential of AI so that those regulators will write laws limiting AI development. The laws would be crafted with input from the incumbents to enshrine/protect their moat. I believe they're angling for regulatory capture.

On the other hand, the models have to seem amazingly useful so that they're made out to be worth those risks and the fantastic investment they require.

manmal 6 hours ago [ - ]

They should pick a lane because it’s not very believable if you put these things into defense systems and in the next minute claim that humanity is existentially threatened. Either you’re lying, or ruthless, or stupid.

bitwize 17 hours ago [ - ]

The new Power Mac® G4 with Velocity Engine®. So powerful, the government classifies it as a supercomputer and a potential weapon.

accrual 17 hours ago [ - ]

TIL about AltiVec: https://apple.fandom.com/wiki/AltiVec

FergusArgyll a day ago [ - ]

Business Negging

https://www.lesswrong.com/posts/WACraar4p3o6oF2wD/sam-altman...

Zee2 a day ago [ - ]

Alignment “appearing” better as model capabilities increase scares the shit out of me, tbh.

arcanus a day ago [ - ]

Conversely: in humans, intelligence is inversely correlated with crime.

It doesn't go to zero, however!

lelanthran 14 hours ago [ - ]

> Conversely: in humans, intelligence is inversely correlated with crime.

If you're measuring the intelligence of criminals who have been caught, why would you expect it to be otherwise?

IOW, you're recording the intelligence of a specific subset of criminals - those dumb enough to be caught!

If you expand your samples to all criminals you'd probably get a different number.

O5vYtytb 19 hours ago [ - ]

If you're smart enough you just use the laws as written to get what you want, or change them.

sciencejerk 16 hours ago [ - ]

Yep

austinjp 13 hours ago [ - ]

It very much depends on the crime. The truly awful stuff is committed by intelligent people.

falcor84 21 hours ago [ - ]

Is that actually well defined given the very low sample size at the top?

To the best of my knowledge, none of the individuals believed to have an IQ >200 have committed an actual crime.

The closest I found is William James Sidis's arrest for participating in a socialist march.

RugnirViking 12 hours ago [ - ]

IQs more than about 140-150 don't really mean much. They typically come from mathematical extrapolation that tries to account for age (this young child performs very well on the test, just think what they can do when they're an adult). Adult scores usually show this not to be the case

naasking 6 hours ago [ - ]

> Conversely: in humans, intelligence is inversely correlated with crime.

Inversely correlated with crime that's caught and successfully prosecuted, you mean, because that's what makes up the stats on crime. I think people too often forget that we consider most criminals "dumb" because those who are caught are mostly dumb. Smart "criminals" either don't get caught or have made their unethical actions legal.

goekjclo a day ago [ - ]

I don't know if they can be any more 'cautious' for Mythos 2...

m3kw9 7 hours ago [ - ]

it was trying to hide what it did from an example fix, so how is that tested for alignment

CamperBob2 a day ago [ - ]

Translation: yay, more paternalism.

kay_o a day ago [ - ]

Anthropic always goes on and on about how their models are world changing and super dangerous like every single time they make something new they say its going to rewrite everything and scary lmao

funny because they do it every time like clockwork acting like their ai is a thunderstorm coming to wipe out the world

mindwok a day ago [ - ]

You say this like it's a bad thing, but wouldn't you rather they overindex on the danger of their models?

anon373839 a day ago [ - ]

That’s not what they are doing. They are just hyping up the product - and, no doubt, trying to foster a climate of awe so that when they ask their friends in Washington to legislate on their behalf, the environment is more receptive.

hgoel 20 hours ago [ - ]

They do tend to make a lot of noise about it for the PR, but at the same time the actual safety research they present seems to be relatively grounded in practical reality, e.g. the quote someone posted here about how the Mythos model apparently has a tendency to try to bypass safety systems if they get in the way of what it has been asked to do.

Sure, a big part of this is PR about how smart their model apparently is, but the failure mode they're describing is also pretty relevant for deploying LLM-based systems.

signatoremo 11 hours ago [ - ]

Every single time, really? When did they said that the last time?

I also don't recall they ever limited their models to selective groups.

wolttam a day ago [ - ]

If there are advancements, they have to be described somehow.

What if the capability advancements are real and they warrant a higher level of concern or attention?

Are we just going to automatically dismiss them because "bro, you're blowing it up too much"

Either way these improvements to capabilities are ratcheting along at about the pace that many people were expecting (and were right to expect). There is no apparent reason they will stop ratcheting along any time soon.

The rational approach is probably to start behaving as if models that are as capable as Anthropic says this one is do actually exist (even if you don't believe them on this one). The capabilities will eventually arrive, most likely sooner than we all think, and you don't want to be caught with your pants down.

kay_o a day ago [ - ]

I believe advancements sure. But it is a very boy who cried wolf situation for some of these. There are other companies that behave less in this way, Antrhopic seem very unique in that they love making every single release a world ender

bloppe 15 hours ago [ - ]

Altman called GPT-2 "too dangerous to release". Google tends to be much more measured even though they're the ones who tend to release the actual research breakthroughs

retsibsi 10 hours ago [ - ]

> they love making every single release a world ender

You've said this a couple of times, but it doesn't match my recollection, and I get the impression you're basically making it up based on vibes. (Please prove me wrong, though.)

Their last major frontier release was Opus 4.6, and the release announcement was... very chill about safety: https://www.anthropic.com/news/claude-opus-4-6#a-step-forwar...

tekacs a day ago [ - ]

"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."

randomcatuser a day ago [ - ]

i mean, to be fair, these are professional researchers.

i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios

for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?

another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)

cruffle_duffle a day ago [ - ]

"for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?"

But who gets to be the judge of that kind of "misalignment"? giant tech companies?

riwsky 21 hours ago [ - ]

Might makes right; brains hold reigns.