Fable is a lot like Opus at its best. It's simply more reliable and feels a bit smarter. For my use cases, using it feels very nice, and notably better than Opus. It needs less direct guidance to get reasonable looking code and I don't have to watch it as closely.
For context, my Claude Code working style is quite heavy on discussion "to align" before implementing anything. We also use a good amount of Markdowns.
Oh yeah, it also is has way less "phrasing quirks" and is a clearer communicator. Opus 4.8 was a bit of loon with some of its writing styles. I had mostly straightened it out, but not entirely. It would use the most ridiculous flair at times.
Yeah same here, it's a huge step up for me. Curious why people are having such different experiences. Is it just to do with what they're working on? Specific prompt styles (eg overfitting on opus)?
I would go out on a limb and say it's a garbage in garbage out problem. People just don't define their problem well enough nor provide enough context and are surprised the model can't magically read their mind and summon data that doesn't exist from thin air. There's only so much raw intelligence can compensate for not having literally anything to go on.
10 years ago this was a joke, now it's Tuesday: https://old.reddit.com/r/ProgrammerHumor/comments/2vk4ph/mac...
I dunno, in my limited use, Fable is MORE prone to phrasing quirks. I had it use, for real, the phrase "load-bearing for correctness" yesterday. It meant something about not needing a validation check because something else (the "load-bearing" part) was already checking it.
I do agree that it *feels* nicer and smarter to use.
I think the tension here is that phrasing like this actually helps keep the model aligned, which is why the training and RL converged on it. But it's so annoying to read!
repetition of "belt-and-suspenders" kills me with opus, especially because it always means the model is suppressing something I would want to be an actual failure
I've had Fable add Chinese characters to our conversation for no reason.
I've also had Fable successfully build a text editor (quill integration) into a Vaadin project that randomly loses its content after you type a few characters (this is on the 3rd iteration).
I've only had that happen with Chinese models until now. Interesting that Fable is doing it too.
I’ve had Opus randomly insert (correct) Russian words into responses. It’s like their training data includes some bilingual forums where idiomatic Russian speakers congregate.
Could it be that Anthropic is using the Chinese characters trick to consume less tokens behind the scenes?
It used a chinese character instead of the word "true"
Aren’t Unicode characters generally treated as 2 tokens to avoid a huge vocabulary?
Same here
How did you straighten it out?
I am drowning in gating propagating semantic mismatches...
Hah, yeah... I added this to my global CLAUDE.md (~/.claude/CLAUDE.md):
## Writing voice — plain, factual, calibrated to the evidence
Write docs, session notes, commit messages, and findings plainly and factually — and calibrate every claim you assert, in chat as much as in writing. This guards against a known LLM tendency to inflate: toward punchy phrasing and claims that read as more settled than the work supports. Same spirit as the Read-Clean Check above, and composes with it — that rule governs journey-framing, this one governs tone and certainty.
*Plain over punchy.* Skip decorative metaphors and dramatic verbs when a plain word is clearer — call a fix "the change", not "the hammer"; logging "flags" a problem rather than being "radar"; numbers "grow", they don't "explode". Plain phrasing reads as engineering; flourish reads as marketing.
*Calibrated confidence.* Everything stated should be well-reasoned and defensible, with the strength of the wording matched to the strength of the evidence. Prefer "found" / "appears" / "points to" over "proved" / "clearly" / "obviously". Name the confounds and what's still unverified. Don't let a bold lead-in pre-announce a conclusion the work hasn't reached.
*Hypotheses stay labeled as hypotheses.* Speculation and educated guesses are useful — when brainstorming or investigating, surface them, and sharing a strong view is welcome. But conviction is not evidence: until there is clear evidence, a claim is a hypothesis and is stated as one — explicitly, even when it's highly compelling. The failure mode is asserting a hunch as settled fact, where it then propagates unchallenged into later docs and summaries. Back a claim with its evidence in the same breath, or mark it as not-yet-backed.
*Factual and forward-looking.* Separate what was measured from what was inferred, and stay pragmatic about what's true, what's still open, and what's next. On next steps specifically, resist the strong LLM pull to converge prematurely:
- A plausible next step is not a decided one. Don't present one or two plausible tasks as the one path we should now follow — that lock-on is a frequent failure mode. - Lay out the real options and their trade-offs. Saying which you'd lean toward and why is welcome and useful — but keep the space open and leave the choice to the user. - Premature certainty about what to do next is as much a miscalibration as premature certainty about what's true.
Have you tried optimizing this prompt so that it’s shorter but gets the same results? I see these super verbose prompts all the time from people who learned prompt engineering in the ‘24-early ‘25 timeframe and they seem unnecessary to me (I get good results with 1-3 sentences) but I hate to assume other people’s experience mirrors my own.