>> People who actually interact with their products know that Fable and Mythos are incremental improvements, not doomsday devices.
If you look outside HN, you'll see that people who interacted with Fable 5 overwhelmingly thought that it was a significant improvement, not simply an incremental one. Most reputable benchmarks show this as well.
I think there's space in the middle between "incremental improvement" and "doomsday device." it's a major step up, sure, but so was GPT-5 over GPT-4.
Step 1: don't trust benchmarks you don't understand - they might measure irrelevant things Step 2: test it on things you know Opus failed
My day-to-day take, for the coding I do (not security related): incremental, modest improvement, if any. Not worth the 2x cost. I've calmly continued to use Opus, happy that it seems like it got an allowance upgrade.
It's a bit odd that you automatically assumed I don't understand the benchmarks.
For most single issues/bugs/tickets, the quality difference wasn't noticeable. But that's like using a sledgehammer to kill a fly. I was using Fable for much more ambitious and complex tasks that require orchestration, and it was crushing it. I described it here: https://news.ycombinator.com/item?id=48505782
So yes, the benchmarks are indeed accurate: where Opus 4.8 would start strong and eventually struggle or run into obstacles, Fable would relentlessly keep working, keep accurate track of all work threads (e.g. multiple inter-dependent issues being worked in parallel by subagents) and would go above and beyond.
I wasn't assuming anything. Generally speaking.
The flow you describe in that comment is rather simple in my opinion and with the right harness even Sonnet would drive most of that.
I judge by the ability to bugfix complex codebases and the direction it takes in architecture. In my opinion, that's a tad more complex (and easier to objectively measure) than orchestrating tickets, no matter how complex.