Hacker News

I don't know, I've been using Mythos this week quite sceptically and I found it to be incredibly dumb. For instance gave it a dialogue between 3 people and it was constantly mixing up who said what to whom, which looked like early Gemini behaviour. But latest Opus does that too. It would also make nonsensical inference about given papers and only correct itself when pointed out what it said wrong. If that is what US government fears... maybe the fear is that someone follows the dumb things the model suggests.

zmmmmm 19 hours ago [ - ]

it feels like it's mostly just tuned to up it's level of capability on long horizon tasks - stop context rot and keep persisting at all costs until a goal is done.

The base intelligence does not feel much greater to me.

hodgehog11 17 hours ago [ - ]

This is a ridiculous thing to test on it. Other models are trained on that kind of thing, use those instead.

Fable was designed for _really_ hard software engineering problems. Possibly large, but especially hard. For those tasks, you feel the difference immediately.

saberience 13 hours ago [ - ]

No it wasn't, Fable is a general purpose model for use in regular chat, analysis, as well as coding.

And yes, the parent poster is accurate, Fable is just as prone to moronic mistakes as Opus was. Stop being so AI-pilled.

Codex is still a better model, and yes, for the hardest engineering problems. I use Claude for UI/GUIs and Codex for all my backend, because I have 20 years of experience of actual hard engineering, and I can see that Codex writes, cleaner code, and is far more steerable.

Bad engineers think Claude is better because it writes more lines of code and is more "proactive", but lines of code doesn't make a better system.

hodgehog11 10 hours ago [ - ]

> Fable is a general purpose model for use in regular chat, analysis, as well as coding

This is a forum filled with experts. Putting marketing aside, in a forum like this, it is most useful to assess models according to the toughest problems in the domain they were specifically refined on. For DeepSeek, that's math. For Claude, that's programming. Gemini and ChatGPT are generalist. Yes, you can use every model for anything you like. But Fable is a bit special, it's very expensive, and very clearly designed for particular types of tasks.

> Fable is just as prone to moronic mistakes as Opus was.

"Just as" is up for debate, but yes, all models are capable of moronic mistakes. That's not helpful information though.

> Codex is still a better model

You're comparing agentic workflows, which relies on a lot more than just the underlying model. It sounds like you're using it like a precision instrument, which is great! It's very different compared to my use cases though, and the ones that Fable seems to excel at. I'm using it for scientific computing, and you really, really want it to one shot a solution. It's either the right algorithm for the task, or the wrong one. So for the hardest problems, it needs to successfully implement a solution in effectively one shot. I use Codex too, but it's often too careless for the delicate tasks. If it gets it wrong, it is really hard to steer it back. You have to start from scratch.

> Bad engineers think Claude is better because it writes more lines of code and is more "proactive".

Think you missed the mark on this one. Not really an engineer, have as much experience as you do in my job. A solution to my problems comprises few lines of code. Fable actually gets it right, first time, every time (so far), but this is with a very long prompt and a bunch of attachments. No other model has done this for me. Not shilling for Anthropic, just impressed. This isn't particularly subjective for me; it is quantitatively measurable.

Don't assume everyone using AI is going to have the same experience you have, or the same types of use cases. And please don't assume that because others have different experiences that it makes them "bad".

Also, Claude has always been mediocre at creative tasks. For your line of work, I would have already recommended Codex hands down.

William_BB 4 hours ago [ - ]

> This is a forum filled with experts

Half of HN commentators probably work on basic CRUD. Armchair experts, maybe.

varispeed 7 hours ago [ - ]

I tested it on that too. A problem I usually give a model to test is to optimise already well optimised function that performs certain calculations. I give it reference to CPU instruction set, how instructions can be paired to take advantage of superscalar execution pipeline etc. In that test also it fell on its face by producing code that was demonstrably slower and with extra bug.

hodgehog11 4 hours ago [ - ]

Interesting, thanks for sharing. That is something I would have expected it to do well on, unless it tripped the internal rerouting. My experience on computational geometry problems has been universally positive (virtually flawless), and falling back to Opus has been a huge and frustrating step back. Opus has been frequently making errors and regressions, Fable never made a single one.