Hacker News

> which makes sense if what you want is a model that's better at benchmarks

This so much.

Opus 4.6 was the last Anthropic model that was good at assisting you, 4.7 and later ones have completely inverted this relationship and it's you assisting it.

Yes, I admit they are smarter, I admit we've reached a point where LLMs are more creative and could be writing better code (albeit with some design hiccups) than I do, but they are also increasingly bad at helping me.

Sure, they do my job when prompted 8 times out of 10 (but then, what's the point of having me anyway?), but my issue is that when I try to invert the relationship they will keep jumping onto solving the issues themselves and disregard my feedback or request.

E.g. I wanted to know some DNS details of an emailer module in Fable 5 and it jumped onto "why I should've used magic links", it just not did what asked.

E.g. 2. There was a worker machine that had an environment misconfiguration and I tasked it to find which github action was setting that specific flag and where. Instead of answering a question, it jumped into just hardcoding it in the code.

E.g. 3. I had some issues with batching, and while I tasked it to investigate whether batching was needed at all for that particular problem (hint, it wasn't) it went and changed the batching logic as to fix the bug.

I am extremely disappointed with Fable's personality.

I can clearly see it's strong, but I'm wondering whether the relationship of LLMs as assistant has broken forever, and it's us now that are being tasked into assisting them instead, because that's how it feels.

The training/reinforcement is clearly biased towards solving problems, not answering questions.

I feel like a lot of this could be solved by having a mode somewhere between Plan Mode and Execute Mode in Claude Code. Quite frequently I'll fire up Claude Code in the context of some checked out code because I want to ask some questions where having access to the source would probably be useful, I don't want it to go running off and making changes though, and I also don't really want a detailed plan for a chunk of work. I just want to ask something like "run cargo build and explain the errors to me", nine times out of ten it will indeed explain the errors but it'll then run off and start trying to fix them regardless of whether I said not to.

Essentially what I want is the experience of using Claude on the web in basic chat mode, but with the ability for it to go read my actual code and perform actions that can assist in finding answers to those questions.