Hacker News

I'm suspect on how much of a coding advance it will be.

Seems odd that their announcement has zero coding benchmarks, with the closest related thing being terminal bench.

Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher.

jdw64 a day ago [ - ]

Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be.

Personally, I think this kind of coding experience varies from person to person

kolinko 13 hours ago [ - ]

Not the size of function but conplexity.

vanuatu a day ago [ - ]

sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case

MangoCoffee a day ago [ - ]

>zero coding benchmarks

"What gets measured gets managed"

artursapek a day ago [ - ]

They claim extreme performance on ExploitBench, which Mythos was touted as being incredible at. https://x.com/OpenAI/status/2070555278576439306

HarHarVeryFunny a day ago [ - ]

My guess is that it's same base model as 5.5, but with additional post-training to improve and benchmaxx on a few things like that.

If they really thought it was competitive with Mythos/Fable across the board, then why wouldn't they release a broader set of benchmarks, and why price it day 1 at 1/2 the cost of Fable?

famouswaffles 2 hours ago [ - ]

>and why price it day 1 at 1/2 the cost of Fable?

Why would they price it the same as Fable it it doesn't cost the same as Fable ?

HarHarVeryFunny 28 minutes ago [ - ]

That's half my point - Anthropic's remarks suggest that is Fable significantly bigger (hence more costly to run) than Opus, so it is priced accordingly, but GPT 5.6 priced the same as 5.5 is one datapoint that suggests they are the same size.

andriy_koval a day ago [ - ]

On graph, they are still slightly bellow Mythos. Maybe enough to not be prohibited by US government?