For me, Claude Code was the most impressive innovation this year. Cursor was a good proof of concept but Claude Code is the tool that actually got me to use LLMs for coding.

The kind of code that Claude produces looks almost exactly like the code I would write myself. It's like it's reading my mind. This is a game changer because I can maintain the code that Claude produces.

With Claude Code, there are no surprises. I can pretty much guess what its code will look like 90% to 95% of the time but it writes it a lot faster than I could. This is an amazing innovation.

Gemini is quite impressive as well. Nano banana in particular is very useful for graphic design.

I haven't tried Gemini with coding yet but TBH, Claude Code does such a great job; if I could code any faster, I would get decision fatigue. I don't like rushing into architecture or UX decisions. I like to sit on certain decisions for a day or two before starting implementation. Once you start in a particular direction, it's hard to undo and you may try to double down on the mistake due to sunk cost fallacy. I try hard to avoid that.

I don't even see much reason to use Cursor. I am used to IntelliJ IDEA, so I just downloaded the Claude Code plugin and basically now I use the IDE only for navigating in the code, finding references and reviewing the code. I can't even remember the last time I wrote more than 2 lines of code. Claude Code has catapulted my performance at least 5x if not more. And now that the cost of writing test is so minimal I am also able to achieve much better (and meaningful!) test coverage too. The AI agents is where the most productivity is. I just create a plan with Claude, iterate over, ask questions, then let it implement the plan, review, ask to do some adjustments. No manual writing of code at all. Zero.

Maybe I'm holding it wrong, but the finer aspects of a codebase it still messes up. If I ask it to implement some weird thing off the beaten path it gets lost. But I completely agree at the test part. I actually test much more now since it's so easy!

IntelliJ has its own Claude integration too, but it does not use your Claude subscription: https://blog.jetbrains.com/ai/2025/09/introducing-claude-age...

Do you guys all work 100% on open source? Or are you uploading bits of your copyrighted code for future training to Anthropic? I hate patents so copyright is the only IP protection I have.

We use AWS Bedrock, so everything stays within our AWS account. It's not like we aren't already uploading our code to GitHub for version control, AWS for deployment, Jetbrains for development, all of ours logs to Datadog, Sentry, Snowflake, and more.

Yeah, my source code is on my computers, in self-hosted version control and self-hosted CI runners

anthropic doesn't use your code for training.

I first got into agentic properly with GLM coding plan (it's like $2/month), but I found myself very consistently asking Claude to make the code more elegant and readable. At which point I realized I was being silly and just switched to Claude code.

(GLM etc. get surprisingly close with good prompting but... $0.60/day to not worry about that is a no brainer.)

Nano Banana Pro is legitimately an insane tool if you know how to use it. I still can’t believe they released it in the wild

It's decent for things that would take a long time in Photoshop. Like most AI, sometimes it works great and sometimes it goes off the rails completely. Most recently, I used it to process some drone photos that were taken during late fall for the purpose of marketing a commercial property. All of the trees/grass/plants were brown, so I told it to make it look like the photos were taken during the summer but not to change anything else. It did a very good job, not just changing the color, but actually adding leaves to the plants and trees in a way that looked very realistic. It did in seconds what would have taken one of my team members hours, leaving them to work on other more pressing projects.

What is there to using it more than asking it to generate an image of something?

For one: modifying existing images in interesting ways ... adding characters, removing elements, altering or enhancing certain features, creating layers, and so on. Things that would take a while on Photoshop, done almost instantly. Really unlocks the imagination.

For me: I've only tried using it seriously a few times but my experience is that you have to juggle carefully when to start a fresh session. It can get really anchored on earlier versions of images. It was interesting balancing iteration and from-scratch prompt refinement.

I gave it an image of my crappy art and asked what steps I could take to make it look better. It gave me specific advice like varying the line widths and how to use this on specific parts of the character. It also pointed out that the shading in my piece was inconsistent and did not reflect the 3d form I was representing and again gave me specific fixes I could implement. I asked for it to give me an updated version of the piece with all of its advice implemented and it did so. I was pretty shocked at all of this.

A friend just used it to generate an image of an office (complete with company name and address on the wall) that was to be used as “address proof” picture for a credit card application (maybe fraud)

I couldn’t tell it apart from the real thing and I have a great AI image eye

I don’t have much time to evaluate tools every months and I have settled on Cursor. I’m curious on what I’m missing when using the same models?

I have only compared Claude Code with Crush and a tool of my own design. In my experience, Claude code is optimized for giant codebases and long tasks. It loves launching dozens of agents in parallel. So it's a bit heavy for smaller, surgical stuff, though it works decent for that too.

If you mostly have small codebases that fit in context, or make many small changes interactively, it's not really great for that (though it can handle it too). It'll just be spending most of its time poking around the codebase, when the whole thing should have just been loaded... (Too bad there's no small repo mode. I made startup hook that just dumps cat dir into context, but yeah, should be a toggle.)

You're not missing much. You can generally use Cursor like Claude Code for normal day to day use. I prefer Cursor because I like reviewing changes in an IDE, and I like being able to switch to the current SOTA model.

Though for more automated work, one thing you miss with Cursor is sub agents. And then to a lesser extent skills (these are pretty easy to emulate in other tools). I'm sure it's only a matter of time though.

Claude Code's VS Code integration is very easy to set up and pretty helpful if you want to see/review changes in an IDE.

The big limitation is that you have to approve/disapprove at every step. With Cursor you can iterate on changes and it updates the diffs until you approve the whole batch.

There is an auto accept diffs mode

You are missing an entire agentic experience. And I wouldn't call it vibe coding for an engineer; you're more or less empowered to truly orchestrate the development of your system.

Cursor has agent, but that's like whoever else tried to copy the Model T while Ford was developing it.

This hasn’t been my experience at all. I’m finding Cursor with Opus 4.5 and plan mode to be just as capable as CC. And I prefer the UI/UX.

If you switch to Codex you will get a lot of tokens for $200, enough to more consistently use high reasoning as well. Cursor is simply far more expensive so you end up using less or using dumber models.

Claude Code is overrated as it uses many of its features and modalities to compensate for model shortcomings that are not as necessary for steering state of the art models like GPT 5.2

I think this is a total misunderstanding of Anthropic’s place in the AI race. Opus 4.5 is absolutely a state of the art model. I won’t knock anyone for preferring Codex, but I think you’re ignoring official and unofficial benchmarks.

See: https://artificialanalysis.ai

> Opus 4.5 is absolutely a state of the art model.

> See: https://artificialanalysis.ai

The field moves fast. Per artificialanalysis, Opus 4.5 is currently behind GPT-5.2 (x-high) and Gemini 3 Pro. Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities.

One thing to remember when comparing ML models of any kind is that single value metrics obscure a lot of nuance and you really have to go through the model results one by one to see how it performs. This is true for vision, NLP, and other modalities.

https://lmarena.ai/leaderboard/webdev

LM Arena shows Claude Opus 4.5 on top

I wonder how model competence and/or user preference on web development (that leaderboard) carries over to more complex and larger projects, or more generally anything other than web development ?

In addition to whatever they are exposed to as part of pre-training, it'd be interesting to know what kind of coding tasks these models are being RL-trained for? Are things like web development and maybe Python/ML coding overemphasized, or are they also being trained on things like Linux/Windows/embedded development etc in different languages?

https://x.com/giansegato/status/2002203155262812529/photo/1

https://x.com/METR_Evals/status/2002203627377574113

> Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

What an insane take for anybody uses these models daily.

Yes, I personally feel that the "official" benchmarks are increasingly diverging from the everyday reality of using these models. My theory is that we are reaching a point where all the models are intelligent enough for day-to-day queries, so points like style/personality and proper use of web queries and other capabilities are better differentiators than intelligence alone.

The benchmarks haven't reflected the real utility for a very long time. At best they tell you which models are definitely bad.

is x-high fast enough to use as a coding agent?

Yes, if you parallelize your work, which you must learn to do if you want the best quality

What am I missing? As suspicious as benchmarks are, your link shows GPT 5.2 to be superior.

It is also out of date as it does not include 5.2 Codex.

Per my point about steerability compensated for by modalities and other harness features: Opus 4.5 scores 58% while GPT 5.2 scores 75% for the instruction following benchmark in your link! Thanks for the hard evidence - GPT 5.2 is 30% ahead of Opus 4.5 there. No wonder Claude Code needs those harness features for the user to manually reign in control over its instruction following capability.

I disagree, the claude models seem the best at tool calling, opus 4.5 seems the smartest, and claude code (+ claude model) seems to make good use of subagents and planning in a way that codex doesn't

Opus 4.5 is so bad at instruction following (30% worse per benchmark shared above) that it requires a manual toggle for plan mode.

GPT 5.2 simply obeys instruction to assemble a plan and avoids the need to compensate for poor steerability that would require the user to manually manage modalities.

Opus has improved though so the plan mode is less necessary than it was before, but it is still far behind state of art steerability.

I've used all of these tools and for me Cursor works just as well but has tabs, easy ways to abort or edit prompts, great visual diff, etc...

Someone sell me on how Claude Code, I just don't get it.

Having only used the base price of each, I loved the ux of cursor and what it enabled me to do, but I hit my monthly cap in 2 days. Whereas Claude code (on pro) I do hit my session limit and even weekly limit once but never have I had to been tools down for 20+ days.

I hear codex is even more generous.

Admittedly all seem cheap enough, but there does seem to be a large diff in pricing

I’m with you, I’ve used CC but I strongly prefer Cursor.

Fundamentally, I don’t like having my agent and my IDE be split. Yes, I know there are CC plugins for IDEs, but you don’t get the same level of tight integration.