Hacker News

Seems like it literally popped up yesterday with the express purpose of building hype for this release.

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

zzleeper 14 hours ago [ - ]

Exactly.. a bit of a red flag for me..

swyx 16 hours ago [ - ]

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

emp17344 15 hours ago [ - ]

Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.

oblio 13 hours ago [ - ]

Come on, why are you a jerk about this?

Nobody would have 800+ billion reasons to lie by commission or omission here.

vanuatu 17 hours ago [ - ]

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

anthonypasq 17 hours ago [ - ]

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

bel8 17 hours ago [ - ]

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

camdenreslink 17 hours ago [ - ]

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

anthonypasq 16 hours ago [ - ]

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

gloosx 3 hours ago [ - ]

Because Cognition is a major customer of Anthropic?