I'm suspect on how much of a coding advance it will be.

Seems odd that their announcement has zero coding benchmarks, with the closest related thing being terminal bench.

Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher.

Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be.

Personally, I think this kind of coding experience varies from person to person

Not the size of function but conplexity.

sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case

>zero coding benchmarks

"What gets measured gets managed"

They claim extreme performance on ExploitBench, which Mythos was touted as being incredible at. https://x.com/OpenAI/status/2070555278576439306

My guess is that it's same base model as 5.5, but with additional post-training to improve and benchmaxx on a few things like that.

If they really thought it was competitive with Mythos/Fable across the board, then why wouldn't they release a broader set of benchmarks, and why price it day 1 at 1/2 the cost of Fable?

>and why price it day 1 at 1/2 the cost of Fable?

Why would they price it the same as Fable it it doesn't cost the same as Fable ?

That's half my point - Anthropic's remarks suggest that is Fable significantly bigger (hence more costly to run) than Opus, so it is priced accordingly, but GPT 5.6 priced the same as 5.5 is one datapoint that suggests they are the same size.

On graph, they are still slightly bellow Mythos. Maybe enough to not be prohibited by US government?