Am I supposed to be impressed by this? I think people are now just using agents for the sake of it. I'm perfectly happy running two simple agents, one for writing and one for reviewing. I don't need to go be writing code at faster than light speed. Just focusing on the spec, and watching the agent as it does its work and intervening when it goes sideways is perfectly fine with me. I'm doing 5-7x productivity easily, and don't need more than that.
I also spend most of my time reviewing the spec to make sure the design is right. Once I'm done, the coding agent can take 10 minutes or 30 minutes. I'm not really in that much of a rush.
Yes I'm still not really understanding this "run agents overnight" thing. Most of the time if I use claude it's done in 5-20 minutes. I've never wanted to have work done for me overnight...tomorrow is already plenty of time for more work, it's not going anywhere, and my employer isn't paying me to produce overnight.
The only counter I have to this is that there are some workflows that have test environments, everything can't or shouldn't just run locally. Sometimes these test take time, and instead of babysitting the model to write code and run the build+deploy+test manually, you can send it off to work until the kinks are worked out.
Add to that I have worked on many projects that take more than 20 minutes to fully build and run tests... unfortunately. And I would consider that part of the job of implementing a feature, and to reduce cycles I have to take.
After the "green" signal I will manually review or send off some secondary reviews in other models. Is it wasteful? Probably. But its pretty damn fun (as long as I ignore the elephant in the room.)
Yeah, our basic integration test suite takes over 20 minutes to run in CI, likely higher locally but I never try to run the full test suite locally. That doesn't even encapsulate PDVs and other continuous testing that runs in the background.
The other day, I wrote a claude skill to pull logs for failing tests on a PR from CI as a CSV for feeding back into claude for troubleshooting. It helped with some debugging but was very fraught and needed human guidance to avoid going in strange directions. I could see this "fix the tests" workflow instrumented as overnight churn loops that are forbidden from modifying test files that run and have engineers review in the morning if more tests pass.
Maybe agentic TDD is the future. I have a bit of a nightmare vision of SWEs becoming more like QA in the future, but with much more automation. More engineering positions may become adversarial QA for LLM output. Figure out how to break LLM output before it goes to prod. Prove the vibe coded apps don't scale.
In the exercise I described above, I was just prompt churning between meetings (having claude record its work and feeding it to the next prompt, pulling test logs in between attempts), without much time to analyze, while another engineer on my team was analyzing and actually manually troubleshooting the vibe coded junk I was pushing up, but we fixed over 100 failing integration tests in a week for a major refactor using claude plus some human(s) in the loop. I do believe it got things done faster than we would have finished without AI. I do think the quality is slightly lower than would have been if we'd had 4 weeks without meetings to build the thing, but the tests do now pass.
Yes that's fair, but not the case for me. Everything can run locally and specs run quickly for covering things claude changes. For everything else, the GitHub CI run is 10-15m and catches any outlier failures, and I'm usually working on more than one thing at a time anyway so it doesn't really matter to wait for this.
Its for when you want to write an spec like "make me a todo list app", then tell your agent of choice to go have fun, and return in the morning to a fully finished app, and not care about what the code is actually doing
I’ve been playing around with these kinds of prompts. My experience is that the prompts need a lot of iteration to truly one-shot something that is halfway usable. If it’s under-spec’d it’ll just return after 15-20 minutes with something that’s not even half baked. If I give it an extremely detailed spec it’ll start dropping requirements and then finish around the 60-70 minute mark, but I needed 20 minutes to write the prompt and I need to hunt for the things it didn’t bother to do.
I’ve gotten some success iterating on the one-shot prompt until it’s less work to productionize the newest artifact than to start over, and it does have some learning benefits to iterate like this. I’m not sure if it’s any faster than just focusing on the problem directly though.
The dropping requirements problem is real. What's helped us is breaking the spec into numbered ACs and having the verification run per-criterion. If AC-3 fails you know exactly what got dropped.
I'll try that out, thanks for the tip!
I went the same way. At first I was splitting off work trees and running all the agents that I could afford, then I realized I just can't keep up with it all, running few agents around one issue in one directory is fast enough. Way faster than before and I can still follow what's happening.
> off work trees and running all the agents that I could afford,
I still think that we, programmers, having to pay money in order to write code is a travesti. And I'm not talking about paying the license for the odd text editor or even for an operating system, I'm talking about day-to-day operations. I'm surprised that there isn't a bigger push-back against this idea.
What is strange about paying for tools that improve productivity? Unless you consider your own time worthless you should always be open to spending more to gain more.
No stock backed company will be paying developers more regardless of much more productive these tools make us. You'll be lucky if they pay for the proper Claude Max plan themselves considering most wouldn't even spring for IntelliJ.
I wasn't thinking about this from the perspective of an IC in a company, more from the perspective of self employment or side projects. But its not any different for a larger business: An IC should not pay for their own tools, but an engineering manager who won't is a fool.
Are the jobs out there actually paying people more?
Your own time is worthless if you’re not spending it doing something that makes more money. You don’t make more money increasing your productivity for work when you’re expected to work the same number of hours.
I've spent a fair amount of time contracting -- this issue is even more relevant here. While I wasn't spending very much on AI tools, what I did spent was worth every penny... for the company I was supporting :).
Fortunately, there was enough work to be done so productivity increases didn't decrease my billable hours. Even if it did, I still would have done it. If it helps me help others, then it's good for my reputation. Thats hard to put a price on, but absolutely worth what I paid in this case.
Dw, there's quite a lot of push back against AI in some of the communities I hang around in. It's just rarely seldom visible here on HN.
It's usually not about the price, but more about the fact that a few megacorps and countries "own" the ability to work this way. This leads to some very real risks that I'm pretty sure will materialize at some point in time, including but not limited to:
- Geopolitical pressure - if some ass-hat of a president hypothetically were to decide "nuh uh - we don't like Spain, they're not being nice to us!", they could forbid AI companies to deliver their services to that specific country.
- Price hikes - if you can deliver "$100 worth of value" per hour, but "$1000 worth of value" per hour with the help of AI, then provider companies could still charge up to $899 per hour of usage and it'd still make "business sense" for you to use them since you're still creating more value with them than without them.
- Reduction in quality - I believe people who were senior developers _before_ starting to use AI assisted coding are still usually capable of producing high quality output. However every single person I know who "started coding" with tools like Claude Code produce horrible horrible software, esp. from a security p.o.v. Most of them just build "internal tools" for themselves, and I highly encourage that. However others have pursued developing and selling more ambitious software...just to get bitten by the fact that it's much more to software development than getting semi-correct output from an AI agent.
- A massive workload on some open source projects. We've all heard about projects closing down their bug bounty programs, declining AI generated PRs etc.
- The loss of the joy - some people enjoy it, some people don't.
We're definitely still in the early days of AI assisted / AI driven coding, and no one really knows how it'll develop...but don't mistake the bubble that is HN for universal positivity and acclaim of AI in the coding space :).
China did users a solid and Qwen is a thing, so the scenario where Anthropic/OpenAI/Google collude and segment the market to ratchet prices in unison just isn’t possible. Amodei talking about value based pricing is a dream unless they buy legislation to outlaw competitors. Altman might have beat them to that punch with this admin, though. Most of us are operating on 10-40% margins. Usually on the low end when there aren’t legal barriers. The 80-99% margins or rent extraction rights SaaS people expect is just out of touch. The revenue the big 3 already pull in now has a lot more to do with branding and fear-mongering than product quality.
My old work machine used power quite aggressively - I was happy to pay for that (and turn it off at night!). This seems even more directly valuable.
It's silly, who wouldn't answer yes to the question "would you like to finish your task faster?". The real trick is to produce more but by putting less effort than before.
> who wouldn't answer yes to the question "would you like to finish your task faster?"
People who enjoy the process of completing the task?
Maybe we'd see "coding gyms" like how white collar workers have gyms for the physical exercise they're not getting from their work.
codeforces and topcoder have existed for years
I salaried employees who are paid by time, and are paying their own Anthropic bills.
Initially there is perhaps a mitigating advantage of briefly impressing ourselves or others with output, but that will quickly fade into the new normal.
Net result: employee paying significant money to produce more, but capturing none of that value.
If you finish faster, you'll be given another task. You're not freeing yourself sooner or spending less effort, you're working the same number of hours for the same pay. Your reward is not joining the ranks of those laid off.
If you are paid hourly and not per task than what is the point in finishing your task faster?
> Am I supposed to be impressed by this?
No. But it is noteworthy. A lot of what one previously needed a SWE to do can now be brute forced well enough with AI. (Granted, everything SWEs complained about being tedious.)
From the customer’s perspective, waiting for buggy code tomorrow from San Francisco, buggy code tonight from India or buggy code from an AI at 4AM aren’t super different for maybe two thirds of use cases.
> A lot of what one previously needed a SWE to do can now be brute forced well enough with AI. (Granted, everything SWEs complained about being tedious.)
Only if you ignore everything they generate. Look at all the comments saying that the agent hallucinates a result, generates always-passing tests, etc. Those are absolutely true observations -- and don't touch on the fact that tests can pass, the red/green approach can give thumbs up and rocket emojis all day long, and the code can still be shitty, brittle and riddled with security and performance flaws. And so now we have people building elaborate castles in the sky to try to catch those problems. Except that the things doing the catching are themselves prone to hallucination. And around we go.
So because a portion of (IMO always bad, but previously unrecognized as bad) coders think that these random text generators are trustworthy enough to run unsupervised, we've moved all of this chaotic energy up a level. There's more output, certainly, but it all feels like we've replaced actual intelligent thought with an army of monkeys making Rube Goldberg machines at scale. It's going to backfire.
What I want to know is, what has this increase in code generation led to? What is the impact?
I don't mean 'Oh I finally have the energy to do that side project that I never could'.
Afterall, the trade-offs have to be worth something... right? Where's the 1-person billion dollar firms at That Mr Altman spoke about?
The way I think of it is code has always been an intermediary step between a vision and an object of value. So is there an increase in this activity that yields the trade-offs to be a net benefit?
> what has this increase in code generation led to?
Every restaurant in my small town has their menu on the website in a normal way. Apparently someone figured out you can take a picture of a paper menu and have AI code it into HTML.
> coders think that these random text generators are trustworthy enough to run unsupervised, we've moved all of this chaotic energy up a level
But it works well enough for most use cases. Most of what we do isn’t life or death.
> But it works well enough for most use cases.
So does the code produced by any bad engineer.
So either we’re finally admitting that all of that leetcode screening and engineer quality gating was a farce, or it wasn’t, and you’re wrong.
I think the answer is in the middle, but the pendulum has swung too far in the “doesn’t matter” direction.
> we’re finally admitting that all of that leetcode screening and engineer quality gating was a farce, or it wasn’t, and you’re wrong
We’re admitting a bit of both. Offshoring just became more instantaneous, secure and efficient. There will still be folks who overplay their hand.
Macroeconomically speaking, I don’t see why we need more software engineers in the future than we have today, and that’s probably a conservative estimate.
> Macroeconomically speaking, I don’t see why we need more software engineers in the future than we have today, and that’s probably a conservative estimate.
Why? Is the argument that there’s a finite amount of software that the world needs, and therefore we will more quickly reach that finite amount?
Seems more likely to me that if LLMs are a force multiplier for software then more software engineers will exist. Or, instead of “software engineers”, call them “people who create software” (even with the assistance of LLMs).
Or maybe the argument is that you need to be a super genius 100x engineer in order to manipulate 17 collaborative and competitive agents in order to reach your maximum potential, and then you’ll take everyone’s jobs?
Idk just seems like wild speculation that isn’t even worth me arguing against. Too late now that I’ve already written it out I guess.
> instead of “software engineers”, call them “people who create software” (even with the assistance of LLMs)
I think this is my hypothesis. A lot more people with a lot less training will create vastly more software. As a consequence, the trade sort of dissolves at the edges as something that pays a premium. Instead, other competencies become the differentiators.
> A lot of what one previously needed a SWE to do can now be brute forced well enough with AI.
I've never met those people. I've met a LOT of PM who tried. I've met a LOT of entrepreneur who also tried. They never cared, nor even understand, code. They only cared about "value" (and they are not necessarily wrong about it) so now they can "produce" something that does what need until it doesn't. When that's the case then they inexorably go back to someone else (might be a SWE, ironically enough, but might also be someone else like them they shift responsibility to, for money).
Brute force works until you have to backtrack, then it becomes prohibitively expensive until one has to actually grok the problem landscape. It's amazing for toy projects though, maybe.
I'm on the same ship. Running 2 agents and seeing a vast amount of productivity increase. Not always though. Sometimes the solutions are very over-engineered and I need to guide the agent to where I want it to go. I do a lot of micro-management, which is totally not where people with agent-orchestras seem to go nowadays.
yup, agree - i spend most of my time reviewing the spec. The highest leverage time is now deciding what to work on and then working on the spec. I ended up building the verify skill (https://github.com/opslane/verify) because I wanted to ensure claude follows the spec. I have found that even after you have the spec - it can sometimes not follow it and it takes a lot of human review to catch those issues.
I would be impressed if I could say "here's $100 turn it into $1000" but you still gotta do the thinking.
agreed, honestly if I see my agent "run" for more than 5 minutes or so I get very suspicious that its doing anything of value other than burning credits because more often than not its just talking to its self or running in loops. I also find the whole multi-agent stuff to be suspect most the time, I don't know that I have seen multiple agents running in parallel do anything that a single agent with good guidance couldn't do synchronously in about the same amount of time.
They are probably paying for expensive subscriptions and want to utilize them. Unfortunately we aren't past the slop stage so a lot of the business logic probably has bugs and unused defensive code that snowballs the more features AI adds.
[dead]