Seems like it literally popped up yesterday with the express purpose of building hype for this release.

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

Exactly.. a bit of a red flag for me..

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.

Come on, why are you a jerk about this?

Nobody would have 800+ billion reasons to lie by commission or omission here.

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

Because Cognition is a major customer of Anthropic?