Hacker News

Fable feels like a version of Opus running on a harness that won't let it halt until it's sure the issue is fixed, which makes sense if what you want is a model that's better at benchmarks.

It's a very good model, but it comes at a huge premium: not only do the tokens cost more, but the model itself really wants to spend them all. For example, working with React Native, Fable never just says "okay, I did the thing, that's it." It tries to rebuild the entire app from scratch, run the whole test suite, and watch every log and warning.

This is the first time with LLMs I've felt that upgrading to a model isn't worth it, even if my company lets me use it, because all the building / testing was just destroying my machine and its battery, which keeps me from working on other things.

For now, it feels like Opus with ultracode is a better choice (less pollution of the main context, more parallelism in investigations).

conradkay 17 hours ago [ - ]

Does low/medium effort fix it for you? Seems like Fable 5 low can outperform Opus 4.8 high/xhigh often, and uses a lot fewer tokens

skerit 9 hours ago [ - ]

Fable 5 on medium is amazing. It's handling everything I throw at it

I had _one_ instance where for some obscure reason it decided to fall back to Opus 4.8 and Opus IMMEDIATELY fucked it up and implemented a super obvious feature in a slightly-wrong way.

_345 16 hours ago [ - ]

In my case no, I actually saw worse performance with fable medium and switched back to opus high and xhigh

epolanski 13 hours ago [ - ]

I find high+ unusable, it's way too slow and "thorough" on 99% of mundane task.

Sure it's better at vibecoding whole tasks, it's clearly good at it, but give it a simple one, and it will still do way more than needed.

It's way too fixated on validating even the simplest things, I find it an unproductive model unless you're implementing whole tasks and doing other things in the meantime.

jon-wood 6 hours ago [ - ]

Why are you deploying a bleeding edge, incredibly expensive, model to do the simplest things? Use Sonnet, hell, use Haiku, they'll get the job done and won't set fire to several rainforests in order to achieve the task.

16 hours ago [ - ]

[deleted]

sanex 17 hours ago [ - ]

I've found the opposite. Granted I use sub agents heavily but I've had it run for hours with far fewer tokens used than when I was previously using opus4.6-8.

firemelt 3 hours ago [ - ]

how did you use the sub agents any example of setup and usecase?

threatripper 18 hours ago [ - ]

On what setting in which environment do you run it? I use the VSCode extension on Extra High and feel like it does exactly what needs to be done and stops when the thing I asked for is done. Extra comments come only when they fall into the area of code that was changed.

jampa 17 hours ago [ - ]

I tested it to fix React Native bugs in a project, comparing it with Opus. It fared better on harder bugs, taking less time to find the root cause, but after implementing a fix, it spent a lot of time and effort on validation. This was mostly unnecessary, since most of the bugs were in the JS code, so for most things, hot reloading is enough for E2E validation and to run just the right tests. No need to run a full build and test suite (which takes 10+ minutes); the CI can do this.

I switched back to Opus because of this validation quirk. Overall, Fable spent 20% of the time on coding and 80% on validation.

I think using Fable for planning and Opus for execution could be a "best of both worlds" approach (I need to test this more), but for most cases, it's not necessary, and Opus is enough.

gbalduzzi 15 hours ago [ - ]

> most of the bugs were in the JS code, so for most things, hot reloading is enough for E2E validation and to run just the right tests. No need to run a full build and test suite (which takes 10+ minutes); the CI can do this.

Have you tried adding this instruction to your agents.MD? Avoiding situations were the agent start running a loop is the main use case of the file for me

wouldbecouldbe 5 hours ago [ - ]

why not just add something like: "No need to run a full build and test suite, I will manually validate"

dreis_sw 5 hours ago [ - ]

I think the new high effort settings are so strong that selecting them when the task doesn't require it actually impacts the output negatively.

Gareth321 12 hours ago [ - ]

I like this proactivity in theory, but as you say: it's expensive. I wonder if this can be solved with the right prompt. E.g. "these are your constraints. Only resolve x. If you are unsure if a task is outside constraint, check with me first."

esjeon 16 hours ago [ - ]

> the model itself really wants to spend them all

In fact, Opus does the same. It finishes the job, and redo it from scratch before presenting the result to the user. This happens even for simpler writing tasks especially when I instruct it to create a text file.

epolanski 13 hours ago [ - ]

> which makes sense if what you want is a model that's better at benchmarks

This so much.

Opus 4.6 was the last Anthropic model that was good at assisting you, 4.7 and later ones have completely inverted this relationship and it's you assisting it.

Yes, I admit they are smarter, I admit we've reached a point where LLMs are more creative and could be writing better code (albeit with some design hiccups) than I do, but they are also increasingly bad at helping me.

Sure, they do my job when prompted 8 times out of 10 (but then, what's the point of having me anyway?), but my issue is that when I try to invert the relationship they will keep jumping onto solving the issues themselves and disregard my feedback or request.

E.g. I wanted to know some DNS details of an emailer module in Fable 5 and it jumped onto "why I should've used magic links", it just not did what asked.

E.g. 2. There was a worker machine that had an environment misconfiguration and I tasked it to find which github action was setting that specific flag and where. Instead of answering a question, it jumped into just hardcoding it in the code.

E.g. 3. I had some issues with batching, and while I tasked it to investigate whether batching was needed at all for that particular problem (hint, it wasn't) it went and changed the batching logic as to fix the bug.

I am extremely disappointed with Fable's personality.

I can clearly see it's strong, but I'm wondering whether the relationship of LLMs as assistant has broken forever, and it's us now that are being tasked into assisting them instead, because that's how it feels.

The training/reinforcement is clearly biased towards solving problems, not answering questions.

jon-wood 6 hours ago [ - ]

I feel like a lot of this could be solved by having a mode somewhere between Plan Mode and Execute Mode in Claude Code. Quite frequently I'll fire up Claude Code in the context of some checked out code because I want to ask some questions where having access to the source would probably be useful, I don't want it to go running off and making changes though, and I also don't really want a detailed plan for a chunk of work. I just want to ask something like "run cargo build and explain the errors to me", nine times out of ten it will indeed explain the errors but it'll then run off and start trying to fix them regardless of whether I said not to.

Essentially what I want is the experience of using Claude on the web in basic chat mode, but with the ability for it to go read my actual code and perform actions that can assist in finding answers to those questions.

dyauspitr 18 hours ago [ - ]

It’s not just a more proactive and diligent opus. The capabilities are significantly higher on fable. It’s not a paradigm shift, but it’s close.

UncleOxidant 17 hours ago [ - ]

I unleashed it on a compiler codebase that I've been developing for several months now using Claude Sonnet 4.5/6, Gemini 3.1 Pro, DeepSeek V4 Pro(recent), and a bit of Qwen3.6-27B. Right away Fable found several longstanding bugs in our compiler that we hadn't found before. It found that there was a critical part of our design that needed to be mostly redesigned/rewritten and gave a very well-reasoned rationale for doing so.

rajveerb 17 hours ago [ - ]

what sort of compiler?

UncleOxidant 17 hours ago [ - ]

A compiler that takes C code (a subset of C with some extensions) and compiles it to microcode for a type of microcoded, algorithmic state machine that we're developing.

andai 9 hours ago [ - ]

They should have made it three times bigger instead of two.

viking123 15 hours ago [ - ]

It's worse than gpt 5.5 xhigh

baq 15 hours ago [ - ]

The jagged frontier strikes again.

I’d say it’s overall better, but not universally better.