I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.

I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.

AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.

We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.

What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?

> I do systems programming.

I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.

I think antirez mentioned somewhere he considered it particularly good at systems programming.

>95% of software development is crud.

It's really not, though. As soon as systems have to scale, regulatory requirements come in, etc. it becomes more complex.

AI has solved simple CRUD, yes, but CRUD, was easy before.

[deleted]

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are, not just for coding?

The "AI implementation" step in my workflow includes separate agents dedicated to testing and reviewing changes. The self feedback loop catches a lot of errors and mistakes, but it rarely produces working code in one go.

In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt.

We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are?

I never touched kubernetes and in 1 week I have a few nodes running and i understand a lot of it. Not perfect but not bad.

I have recently learned Kubernetes without AI and one week is more than enough to understand most of it.

This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.

…it definitely is true, I spun up a cluster at home to learn it for a new job and felt comfortable with the basics within a few days.

[deleted]

That was the usual experience pre AI

Anytime you hear such wild claims, imagine a typical code sweat shop (not just crud apps but templated eshops/business pages etc), not a system that will evolve for another 10-20 years beyond initial implementation and is backend cornerstone of some part of some corporation. That is in the case its actually true, there is tons of PR happening here, plus another gigaton of uncritical fanboyism like with any strong topic.

Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.

I also have your experience, even 100x code delivery improvement would barely move the needle of project delivery in our place. Better, more automated integration and end-to-end functional tests which reflect real world usage/data flows would actually make much bigger difference, no reason to think llms couldn't deliver this in near future.

Not the OP, but it might be that AI isn't as good at systems programming as it is at other domains, or it might be that you're using it differently than I am. I don't know which one it is (maybe AI just isn't good at writing the language you work with).

For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.

In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.

The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.

It sounds like we have opposite experiences.

The only way you could possibly know that is if you're reviewing the code, which means you're not "managing fleets of agents". If you're not reviewing the code (and you wouldn't be if you're managing fleets of agents), then you have no way to tell what you're shipping.

It’s under-appreciated that a proper review takes at least as long as the actual work: it’s all the same time spent understanding the challenge and coming up with the best solution, minus the time spent typing in your solution (almost never a significant amount), plus the time spent understanding their solution and explaining how to get from theirs to yours.

Can you link to a changelog that shows the 5-10x feature increases? I keep hearing this, but I don’t see anything I use ever actually shipping like this, or people backing this up with any sort of proof.

Does what you ship involve hundreds of lines of HTML/CSS by any chance? Do you care about accessibility?

Give an example.

I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.

What's your example?

If you carefully review the code then you're not doing what Armstrong was talking about. If you're not reviewing the code, then you don't really know what it is that the AI built. Of course it passes tests; that's not the problem. The problem is that the code is complicated and obtuse, even if it doesn't seem that way on the surface, and after some rounds of evolution, the agents are no longer able to evolve or maintain the code.

The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.

What you are shipping is not the same as what Coinbase is shipping. These are vastly different things. Making a shiny app with AI is great, I'm doing it as I type this. But I am under no delusion that what I make can sustain a multi-million dollar or even billion dollar business in the case of Coinbase. That's plain silly.

Shipping garbage.