Maybe I'm an outlier but Sonnet 4.5 quality is about as low as I'm willing to go.

It's generation speed is not the problem or the time sink.

It's wrestling with it to get the right output.

---

And just to clarify as maybe I misunderstood again but people are comparing cursor to Claude Code and codex etc here- isn't this whole article all cursor just using different models?

> Sonnet 4.5 quality is about as low as I'm willing to go.

literally a 30 day old model and you've moved the "low" goalpost all the way there haha. funny how humans work

Yup - just like sibling comment said - my "low bar" is going to be whatever the best model is that isn't unreasonably costly/expensive.

Speed of model just isn't the bottleneck for me.

Before it I used Opus 4.1, and before that Opus 4.0 and before that Sonnet 4.0 - which each have been getting slightly better. It's not like Sonnet 4.5 is some crazy step function improvement (but the speed over Opus is definitely nice)

I think Opus 4.1 is still much better than Sonnet 4.5

If cost is not considered- absolutely. That being said sonnet 4.5 and using thinking where it makes sense feels like way more bang for your buck and usually good enough. I really don't use opus anymore

Not sure about parent, but my current bar is set by GPT-5 high in codex cli. Sonnet 4.5 doesn't quite get there in many of the use cases that are important to me. I still use sonnet for most less intelligence phases and tasks (until I get crunched by rate limits). But when it comes to writing the final coding prompt and the final verification prompt and executing a coder or a verifier that will execute and verify well it's GPT 5 high all the way. Even if sonnet is better at tool calling, GPT 5 High is just smarter and has better coding/engineering judgement and that difference is important to me. So I very much get the sentiment of not going below sonnet intelligence 4.5 for coding. It's where I draw the line too.

Yes? Because why should we settle for less now that it is available?

because engineering is the art of "good enough" and composer is clearly "good enough but a lot faster" which makes up for intelligence gaps in interesting ways

For me the bar for barely good enough is and always has been Codex. Before I found frontier models more trouble than they're worth. And there is still a massive amount of room to grow before I can genuinely say working with these tools is more enjoyable than frustrating for me and now I use them (and how I think they should work).

It's not good enough for a lot of us, though, clearly.

There’s two different kinds of users, on one side people are more hands off and want the model to autonomously handle longer tasks on its own with minimal guidance, and on the other side is users who want to interactively collaborate with the model to produce desired results. Speed matters much more for the second case, where you know what you want and just want the model to implement whatever you had in mind as quick as possible. Intelligence/ability matters more for the first case when you don’t have full understanding of all the code. I think it’s context dependent for me where more serious work tends to be more interactive. The intelligence of a model doesn’t make up for issues due to lack of context to me.

I'm very solidly in the second group - but I review all the code. If it writes faster than I can read, that's fast enough.

Agree that Sonnet 4.5 is an excellent model. Would be curious to hear your experience using Composer though, it's quite good.

I'll try it out! I haven't yet - just generally conveying my opinion that I personally weigh "better model" much more important than speed, assuming some "fast enough"

Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!

Thanks! Yeah, been working here for 9 months now. Fascinated byt agentic coding both as a researcher and user.

Totally agree that "smart model" is the table stakes for usefulness these days.

> Composer though, it's quite good

Wow, no kidding. It is quite good!

Same... I've found that using a non-Claude model just ends up being more expensive and not worth it. "Auto" tokens are hardly free, and I've had plenty of experiences putting "Auto" to work on a "simple" seeming task to have it consume like 1 USD of tokens quite quickly while producing nothing of value, when I'd replay with Claude 4.5 Sonnet non-thinking and it would provide a solid solution for 0.5 USD.

The reason I pulled out the comparison is to highlight how serious they are about all the important parts that make or break the AI coding experience - speed being very important to me. I’d rather catch my model doing the wrong thing quickly than having a higher chance of one-shotting it at the cost of having to do a lot of specification upfront.

gpt-5-high is as low as i can go :]