Today I got a feature request from another team in a call. I typed into our slack channel as a note. Someone typed @cursor and moments later the feature was implemented (correctly) and ready to merge.
The tools are good! The main bottleneck right now is better scaffolding so that they can be thoroughly adopted and so that the agents can QA their own work.
I see no particular reason not to think that software engineering as we know it will be massively disrupted in the next few years, and probably other industries close behind.
The anecdote is compelling, but there's an interesting measurement gap. METR ran a randomized controlled trial with experienced open-source developers — they were actually 19% slower with AI assistance, but self-reported being 24% faster. A ~40 point perception gap.
Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.
Respectfully, was this comment AI generated? It has all the signs.
And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.
Yeah but was Cursor using Claude? What's the moat that any of these companies have that prevents me from using another LLM?
It really doesn't matter how "good" these tools feel, or whatever vague metric you want - they hemorrhage cash at a rate perhaps not seen in human history. In other words, that usage you like is costing them tons of money - the bet is that energy/compute will become vastly cheaper in a matter of a couple of years (extremely unlikely), or they find other ways to monetize that don't absolutely destroy the utility of their product (ads, an area we have seen google flop in spectacularly).
And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.
> In other words, that usage you like is costing them tons of money
Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.
I didn't think there would need to be more evidence than the fact they are saying they need to spend $600 billion in 4 years on $13bn revenue currently, but here we are.
Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...
Right, but if OpenAI wanted to stop doing research and just monetize its current models, all indications are that it would be profitable. If not, various adjustments to pricing/ads/ etc could get it there. However, it has no reason to do this, and like all the other labs is going insanely into debt to develop more models. I'm not saying that it's necessarily going to work out, but they're far from the first company to prioritize growth over profitability
Nope. The only "all indications" are that they say so. They may be making a profit on API usage, but even that is very suspect - compare against how much it actually costs to rent a rack of B200s from Microsoft. But for the millions of people using Codex/Claude Code/Copilot, the costs of $20-$30-$200 clearly don't compare to the actual cost of inference.
What was the feature and what was the note?
It was a modest update to a UX ... certainly nothing world-changing. (It's also had success with some backend performance refactors, but this particular change was all frontend.) The note was basically just a transcription of what I was asked to do, and did not provide any technical hints as to how to go about the work. The agent figured out what codebase, application, and file to modify and made the correct edit.
That's pretty neat! Thanks for elaborating.