I’ve found that agents are obsessed with adding more lines of code. Even when asking them to simplify they’ll remove 50 lines of code and then add 100 more. You have to explicitly tell them you want less lines of code. So I just do that after iterating on a task for a few steps.

I think the problem is that agents are inherently stochastic. Their idea of simplification changes from message to message because whatever objective it’s operating on internally is inherently opaque and changes. No matter how much you prompt it, eventually it’s going to not do what you want it to do.

I built https://github.com/thempatel/mdlr for precisely this reason: externalize the objective and force the agent to meet it.

Interesting, I'll be testing your tool on my repos. You should publish to crates.io!

Thank you so much! I am open to any and all feedback. Please file an issue or discussion if you have things you'd like to share.

Getting this onto crates.io is a great suggestion, I will look into that!

I have been wondering whether Anthropic are just gaslighting everyone with new model releases while in reality it's just the same base model with some internal knobs tuned more and more up with every new release to provide longer and longer thinking threads and outputs.

My speculative assumption is that these long thinking threads and self-checking tend to produce somewhat better output at the price of huge price increases due to the token burn.

I imagine it's the same foundation model on the 4 series, with Fable 5/Mythos being a new or upgraded foundation model. Then the point releases are fine-tuning plus post-training alignment with desired outcomes. The "thinking" can involve multiple steps, eg. asking the model first what it thinks the user wants to do, why it wants to do it, rewriting the prompt to generate better outcomes, how it should do it, come up with a plan, etc. So when they announce each point release like Opus 4.8, they're probably adding new layers of thinking to try and get good results on benchmarks. And that of course has cost and speed implications.

Then Sonnet/Haiku are just attempts to quantise/distil down to an acceptable performance/cost ratio. The cynic in me says we probably won't see any more of those until post-IPO, keep people addicted to the most costly models to pump a quarter or two of revenue figures, unless a competitor starts seriously undercutting them on price/performance. Hence the recent requests to slow down model training worldwide with their competitors.

Of course it could be that Fable "5" is just a marketing bump to the version, not a new foundation model...

> Then Sonnet/Haiku are just attempts to quantise/distil down to an acceptable performance/cost ratio. The cynic in me says we probably won't see any more of those until post-IPO, keep people addicted to the most costly models to pump a quarter or two of revenue figures, unless a competitor starts seriously undercutting them on price/performance. Hence the recent requests to slow down model training worldwide with their competitors.

I'm guessing there'll be a Sonnet/Haiku 5 release just around IPO, to keep the news cycle going, and so that user numbers will get a boost.

Im pretty sure Anthropic have hired people with Industrial Organisation background and so have OAI.

If you read a decent text and look at the actions both firms have taken you'll quickly see its literally textbook.

Can you expand a bit for people unfamiliar with Industrial Organisation planning?