Just chiming in to inject some healthy skepticism into this comment thread. It's helpful for me (and for my mental health) to consider incentives when announcements like this happen.
I don't doubt that this model is more powerful than Opus 4.6, but to what degree is still unknown. Benchmarks can be gamed and claims can be exaggerated, especially if there isn't any method to reproduce results.
This is a company that's battling it out with a number of other well-funded and extremely capable competitors. What they've done so far is remarkable, but at the end of the day they want to win this race. They also have an upcoming IPO.
Scare-mongering like this is Anthropic's bread and butter, they're extremely good at it. They do it in a subtle and almost tasteful way sometimes. Their position as the respectable AI outfit that caters to enterprise gives them good footing to do it, too.
I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.
Data has always been the core of it all, onward to the next abstraction, I suppose.
I think computational thinking, or basically "how do I solve this problem efficiently" training data is more valuable then feeding in answers. I don't know what these AI models training data consist of, but it would be interesting to see a model trained purely on reasoning, methods, those foundational skills (basic programming? or maybe not) and then give it some benchmarks.
What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
Anthropic would def benefit as benchmarks are almost always quite useless vs real life use.
How specifically would they benefit. People flock to them based on the hype and then the model sucks and they leave?
If anything I’m seeing too much skepticism and not enough alarm. People burying their heads in the sand, fingers in their ears denying where this is all going. Unbelievable except it’s exactly what I expect from humans.
Forgive me, but this is probably the 29th world destroying model I've seen in the last 4 years, that will change everything, take all the jobs, cure all the cancers and eat all the puppies.
I’m beyond trying to convince people to take this technology seriously. You’ll learn for yourself.
OpenAI didn't want to make GPT2 available because it was "too dangerous" [1].
[1] https://www.theguardian.com/technology/2019/feb/14/elon-musk...
Alarm from hype is what they want, you are playing straight into their PR dept's hands
alarm about what, exactly?
Is it healthy? Maybe every company is a profit-maximizer wearing a skin suit, and people support their siblings exactly twice as much as their cousins.
When you slice down to the game-theory-optimal bone, you are, in some sense, cutting off their wiggle room to do anything else
I take your point, but the AI race is a strange environment. We see wild claims being thrown out all the time from other companies and executives with little to no evidence. It's cut-throat, there's a ton of money at stake.
All I'm saying is that Anthropic isn't unique here. Their claims may be more measured by comparison and come with anecdotal evidence, but the hype is still there behind the scenes.
Finally a comment that doesn't just glaze Mythos without being critical. I question how even supposed the smarter bunch in HN all been degraded in critical thinking dept. It's sad to see comments just taking it up as its without using it even once.
It's really not some conspiracy. I imagine we will see vuln reports soon.