This _all_ (waves hands around) sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

Writing _all_ (waves hands around various llm wrapper git repos) these frameworks and harnesses, built on top of ever changing models sure doesn't feel sensible.

I don't know what the best way of using these things is, but from my personal experience, the defaults get me a looong way. Letting these things churn away overnight, burning money in the process, with no human oversight seems like something we'll collectively look back at in a few years and laugh about, like using PHP!

> sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

Not if you are an AI gold rush shovel salesman.

From the article:

> I've run Claude Code workshops for over 100 engineers in the last six months

Yeah, my colleague recently said "hey I've burnt through $200 in Claude in 3 days". And he was prompting. Max 8hrs/day Imagine what would happen if AI was prompting.

As I like this allegory really much, AI is (or should be) like and exoskeleton, should help people do things. If you step out of your car putting it first in drive mode, and going to sleep, next day it will be farther, but the question is, is it still on road

Burnt through 4 Max x20 in a week here. Throughput isn't the bottleneck anymore. Review quality is. The 1-in-5 error rate in this thread matches my experience. More agents overnight just means more review tomorrow morning.

What moved the needle: capturing architectural context (ADRs, structured system prompts, skill files) that agents reference before making changes. Each session builds on prior decisions. The agent improves because the context compounds. Better context beat more parallelism every time.

This matches what I've found running persistent agents. The compounding context is the whole game.

The pattern that works: treat your agent's workspace like infrastructure, not a scratch pad. ADRs, skill files, structured memory of past decisions - all of it becomes the equivalent of institutional knowledge that a senior engineer carries in their head. Except it survives session restarts.

The article's TDD framing gets at something important too. The acceptance criteria aren't just verification - they're context. When you write "after 5 failed attempts, login blocked for 60 seconds" before the agent touches code, you've constrained the solution space dramatically. The agent isn't guessing what you want anymore.

Where I think the article undersells the problem: spec misunderstandings compound too. If your architectural context has a wrong assumption baked in, every agent session inherits that assumption. You need periodic human review of the context itself, not just the outputs. The ADRs need auditing the same way code does.

https://github.com/safety-quotient-lab/psychology-agent <- I've been exploring ways to track decisions, making some interesting findings, at the homelab scale, at least.

The cognitive architecture, so to speak, for the LLM can make a huge difference - triggers and skills go a long way when combined with shell scripts that dual-write.

This comment reads very strongly like it was written by an LLM.

Your sibling even more so.

[deleted]

Agreed. The spec file is context. Writing acceptance criteria before you prompt provides the context the agent needs to not go off in the wrong direction. Human leverage just moved up and the plan/spec is the most important step.

Parallelism on top of bad context just gets you more wrong answers faster

Sorry but isn't the bottleneck then simply to do even relevant things? Like how much of a qualified backlog do you have that your pipeline does not run dry?

Reminds me of when I was looking for Obsidian note management workflows and every single person who posted about theirs used it to take notes on... note taking workflows.

Bingo.

I would encourage my competitors to use AI agents on their codebase as much as possible. Make sure every new feature has it, lots of velocity! Run those suckers day and night. Don't review it, just make sure the feature is there! Then when the music stops, the AI companies hit the economic realities, go insolvent, and they are left with no one who understands a sprawling tangled web of code that is 80% AI generated, then we'll see who laughs last.

> they are left with no one who understands a sprawling tangled web of code that is 80% [random people that I can't ask because they don't work here anymore and they didn't care to leave docs or comments] generated, then we'll see who laughs last.

Yes, this matches my experience with codebases before AI was a thing.

Yes, but given a feature that should take say 100 lines of code, the average programmer will write in the order of 100 to 500 lines. If they're a heavy OOP user, maybe they'll write 10 classes that total 2000 lines. Regardless, worst case, it will be within ~2 orders of magnitude of a reasonable solution.

It's not that they're not trying to write the biggest clusterfuck possible and maximize suffering in the world, it's just that there's a human limit on how much garbage they can type out in their allocated time.

This is where AI revolutionizes things. You want 25,000 lines of React? On the backend? And a custom useEffect-backed database? Certainly!

> it's just that there's a human limit on how much garbage they can type out in their allocated time.

Another example where removing friction and constraints is a bad thing.

i think the friction has moved upstream - now it's working on the right thing and specifying what correct looks like. i don't think we are going back to a world where we will write code by hand again.

Unless what you want to do isn't well represented in the training set.

Yeah, in the past the limiting factor was the human suffering of the engineer who had to try and fit the sprawling nightmare fuel into their brain.

The machine doesn't suffer. Or if it does nobody cares. People eventually start having panic attacks, the machine can just be reset.

I suspect that the end result is just driving further into the wilderness before reality sets in and you have to call an adult.

Both be true at the same time: some teams spend a fortune on AI and the AI investments won't get the expected ROI (bubble collapse). What is sure is that a lot of capacity is been built and that capacity won't disappear.

What I could see happening in your scenario is the company suffers from diminishing return as every task becomes more expensive (new feature, debugging session, library update, refactoring, security audit, rollouts, infra cost). They could also end up with an incoherent gigantic product that doesn't make sense to their customer.

Both pitfall are avoidable, but they require focus and attention to detail. Things we still need humans for.

> What is sure is that a lot of capacity is been built and that capacity won't disappear.

They really are subsidizing what will be an incredibly healthy used server equipment market in a year or two. Can’t wait. My homelab is going to be due for an upgrade.

Qwen3 Coder Next and Qwen3.5-35B-A3B already very good and can be run on today's higher end home computers with good speed. Tomorrow's machines will not be slower but models are keep getting more efficient. A good sw engineer still would be valuable in Tomorrow's world but not as a software assembler.

Even cutting edge models are not very good. They are not even on mediocre level. Don’t get me wrong, they are improving, and they are awesome, but they are nowhere near good yet. Vibe coded projects have more bugs than features, their architecture and design system are terrible, and their tests are completely useless about half the time. If you want a good product you need to rewrite almost everything what’s written by LLMs. Probably this won’t be the case in a few years, but now even “very good” LLMs are not very good at all.

With Claude Code now having a /plan mode - you can take your time and deliberate through architecture and design, collaboratively, instead of just sending a fire-and-forget. Much less buggy and saves time if you keep an eye on the output as you go, guiding it and catching defects, imho.

Not sure why you're being downvoted, this is very much my experience. When it matters (like, customer data is on the line) vibecoded projects are not just hilariously bad, but put you in legal danger.

We've so far found that Claude code is fine as a kind of better Coverity for uncovering memory leaks and similar. You have to check its work very carefully because about 1 time in 5 it just gets stuff wrong. It's great that it gets stuff right 4 times in 5 and produces natural code that fits into the style of the existing project, but it's nothing earth-shattering. We've had tools to detect memory leaks before.

We had someone attempt to translate one of our existing projects into Rust and the result was just wrong at a fundamental level. It did compile and pass its own tests, so if you had no idea about the problem space you might even have accepted its work.

[dead]

> Tomorrow's machines will not be slower

The way it's going, the AI hyperscalers are buying such a big portion of the world's hardware, that it may very well happen that tomorrow's machines do get slower per dollar of purchase value.

Not my experience. Current Qwen Coder is noteworthy but still far from good. Can't compare them with current commercial offerings, it is just different leagues.

> Don't review it, just make sure the feature is there!

Bad idea. Use another agent to do automatic review. (And a third agent writing tests.)

Don't forget the architecting and orchestrating agent too!

Multiple agents with different frontier models for best results. Claude code/codex shops don’t know what they’re missing if they never let Gemini roast their designs, code and formal models.

This.

Claude Code wrote a blog article for me documenting a Gemini interaction that I manually operated. I found it quite interesting - the difference in "personalities", and the quality of Claude's output is stark in comparison to the Gemini's.

But still, best to have two sets of eyes.

I am not laughing about PHP. To this very day many of my best projects are built on PHP. And while last 7 years I have spent in full stack JavaScript/TypeScript environment it has never produced the same things I was actually able to do with PHP.

I actually feel that things I built 15 years ago in PHP were better than anything I am trying to achieve with modern things that gets outdated every 6 months.

I feel like today an engineer with a modern framework and AI con produce in an afternoon a product that deliver real value, something that 25 years ago would have required a full hour by a high schooler with MS Access.

I was building awesome things with Access 20 years ago. I loved that thing. I wasn't even a software engineer. I was in the EE, but I needed a way to track process and it definitely outperformed. And the best thing, it didn't cost us anything. Everybody already had access, lol. I had 40 people use it in production, manufacturing cutting edge stuff. Definitely beat spreadsheets because Access gave you gui for operators.

what in God's Name could you do in PHP that you can't do in a modern framework?

Nothing; but PHP, in experienced hands, will be waaay more productive for small-to-medium things. One issue is that experienced hands are increasingly hard to come by. Truly big, complicated things, built by large teams or numbers of teams, teams with a lot of average brains or AIs trained on average brains, will be better off in something like Typescript/React. And everyone wants to work on the big complicated stuff. So the "modern frameworks" will continue to dominate while smaller, more niche shops will wonder why they waste their time.

I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful, app doesn't scale, high latency etc. What does their php code do? 95% of it is calling a DB.

You're telling me today with LLM power multiplier it's THAT much faster to write in PHP compared to something that can actually have a future?

“PHP was so easy and fast that they’ve built such a successful startup they now have scaling problems” is, as far as I can tell, an endorsement of PHP and not a criticism of it.

I think the point here is that the scaling problem is hard because of PHP.

Scaling can be hard in PHP at the same time GGP comment's about PHP being in productive hands and thus being one of the reasons why PHP worked for them. Both of these can be true at the same time.

And for what its worth, Typescript scaling, although better than PHP is still somewhat of an issue and If you want to have massive scaling, Elixir/ (to-an-extent gleam) are developed for solving the scalability problem especially with Phoenix framework in Elixir-land.

So I guess, jack_pp comment's about PHP can also be applied to an degree towards Typescript as well so we should all use elixir, and also within the TS framework the question can be asked for (sveltekit/solid vs next-js/react)

I am more on the svelte side of things but I see people who love react and same for those who love PHP. So my opinion is sort of that everyone can run in their own languages.

Golang is another language to be taken into consideration especially with Htmx/datastar-go/alpine.

Scaling in PHP is easy. Has never actually been an issue in my entire career unless it was a badly designed database.

Yes, startup success has a direct correlation to the language chosen for your CRUD api…

> I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful

You can stop there! Sounds like PHP worked for them. Already doing better than 90% of startups.

If 95% of what app does is calling a DB, then the bottleneck is in the DB, not with the PHP.

You can use persistent DB connections, and app server such as FrankenPHP to persist state between requests, but that still wouldn't help if DB is the bottleneck.

Sometimes it’s still the app:

   rows = select all accounts
   for each row in rows:
       update row
But that’s not necessarily a PHP problem. N+1 queries are everywhere.

Depending on what you are doing, the above is not necessarily bad.. often much better than an SQL that locks an entire table (potentially blocking the whole DB, if this is one of the key tables).

> I worked at a startup, they built their API in PHP because it was easy and fast. Now they're successful, app doesn't scale, high latency etc. What does their php code do? 95% of it is calling a DB.

So PHP worked perfectly, but the DB is slow? Your DB isn't going any faster by switching to something else, if that's what you think.

PHP is the future, where React has been heading for years.

> Your DB isn't going any faster by switching to something else, if that's what you think.

Only true if none of the DB accesses are about stuff that could live as state across requests in a server that wasn't php. Sure, for some of that the DB's caching will be just as good, but for others, not at all.

That is possible, but it sounds unlikely to me.

In most cases you could add a shared cache to fix the problem - e.g. put your shared state in Redis, or in a file that is synced across servers (if its kept as state in a long running process it cannot need to be updated frequently).

Not scaling and high latency sound like a skill issue, not a PHP issue.

What does this even mean? If you've got scaling problems, it's not because you've used PHP.

by future do you mean Future<T> or metaphorical future? :)

PHP did better than python and perl. Python is doomed. PHP got a good jit already, a good OO lately, good frameworks, stable extensions. It has a company behind.

Unlike python or ruby which break right and left all the time on updates. you have to use bunkers of venvs, without any security updates. A nightmare.

PHP can scale and has a future.

Python is doomed? That's new.

You use python docker images pinned to a stable version (3.11 etc), and between bigger versions, you test and handle any breaking changes.

I feel like this approach applies to pretty much every language?

Who on earth raw dogs on "language:latest" and just hopes for the best?

Granted I wouldn't be running Facebook's backend on something like this. But i feel that isn't a problem 95% of people need to deal with.

No, only to python. And partially ruby and ocaml. Not to typescript, perl or PHP.

Introducing uv...

https://docs.astral.sh/uv/

uv does not fix the need for venv's or docker containers. normal people update their libs with the hope to get problems fixed.

python people don't update their libs, because then everything will break right and left. so they keep their security problems running.

No matter how you look at it, the dependencies have to go somewhere. Node uses node_modules, most compiled languages require compiled libraries (or they're a huge blob), etc. Idk about PHP but I'm pretty sure 3rd party things for any given app also live somewhere. Different ways of managing dependencies. It's recommended that venvs are used in Python because you may accidentally nuke a system script by doing global installs, and otherwise there still needs to be some sort of 3p version handling when you have multiple projects going.

Once something works in Python (which uv now makes trivial; before it could be a pain), updating 3rd party packages rarely cause breakage. But yes, I think many who use it hardly update, because things usually continue to work for years and the attack surface is pretty narrow[0]. Heck just a few days ago I checked out a project that I hadn't touched in years, which I wrote in Python 3.7; updated to 3.13 and it continued to just work. Compare to PHP which has a far higher attack surface[1] and often has breaking changes. I've heard a couple nightmare stories of a v7.x -> v8.x move being delayed because it required a serious codebase rewrite.

[0] https://www.cvedetails.com/product/18230/Python-Python.html?... [1] https://www.cvedetails.com/product/128/PHP-PHP.html?vendor_i...

I don't think it's true that experienced hands will be faster in PHP than in Python or JS or whatever. It's just about what you know, and experienced hands are experienced.

PHP is faster to develop in then Pythin or JS then addin a framework like Laravel and you are already done.

Python has the curse of spaces or tabs and JS has the curse of npm.

PHP has the curse of T_PAAMAYIM_NEKUDOTAYIM.

You can build those things in modern frameworks, it will just be more headache and will feel outdated in 6 months.

Where are my backbone apps? In the trash? Me ember apps? Next to them. My create-react-apps? On top of those. My Next apps? Being trashed as we speak. My rails apps? Online and making money every year with minimal upgrade time. What the hell was I thinking.

I'm guessing you avoided the CoffeeScript era of Rails, which is a good thing.

6 years ago I was writing apps in typescript and react, if I was starting a new project today I'd write it in typescript and react.

People bicker about PHP and Javascript, sorry Typescript, like they aren't both mule languages peoppe pick up to get work done. They both matured really well through years of production use.

They are in the same group, similar pedigree. If you were programming purely for the art of it, you would have had time to discover much nicer languages than either, but that's not what most people are doing so it doesn't really matter. They're different but they're about as good as eachother.

Making instant loading and user respecting sites.

Could you give examples of the modern frameworks that you have in mind?

Don’t confuse php the language with php the way of webmaster 2006 vintage.

Those webmasters built the web a lot of people are now nostalgic about already.

Not have to "build" anything. You edit code and it is already deployed on your dev instance.

Deploying to production is just scp -rv * production:/var/www/

Beautifully simple. No npm build crap.

You trade having to compile for actually having code that can scale

Not sure what you’re talking about, I scaled to millions of users on a pair of boxes with PHP, and its page generation time absolutely crushed Rails/Django times. Apache with mod PHP auto scales wonderfully.

It scales just fine the same way everything else scales: put a load balancer in front of multiple instances of your app.

It can scale by the virtue of spending a lot less time processing the request

You don't know anything about the PHP ecosystem and it shows.

The comparison target for PHP is IMHO a good Python web framework, e.g. Django being the most popular one. I still don't understand how JavaScript is ever considered viable, TypeScript makes it workable I guess…

> sounds like alot of work and expense for something that is meant to make programming easier and cheaper.

It's not more work; it's a convergence of roles. BA/PO/QA/SWE are merging.

AI has automated aspects of those roles that have made the traditional separation of concerns less desirable. A new hybrid role is emerging. The person writing these acceptance criteria can be the one guiding the AI to develop them.

So now we have dev-BAs or BA-devs or however you'd like to frame it. They're closer to the business than a dev might have been or closer to development than a BA might have been. The point is, smaller teams are able to play wider now.

Oh a modern comeback of the analyst-programmer?

> It's not more work

It literally is. You're spending weeks of effort babysitting harnesses and evaluating models while shipping nothing at all.

That hasn't been my experience, as a "ship or die" solopreneur. It takes work to set up these new processes and procedures, but it's like building a factory; you're able to produce more once they're in place.

And you're able to play wider, which is why the small team is king. Roles are converging both in technologies and in functions. That leads to more software that's tailored to niche use cases.

> you're able to produce more once they're in place

Cool story, unfortunately the proof is not in the pudding and none of this fantom x10 vibe-coded software actually works or can be downloaded and used by real people.

P.S. Compare to AI-generated music which is actually a thing now and is everywhere on every streaming platform. If vibe coding was a real thing by now we'd have 10 vibecoded repos on Github for every real repo.

There's no need to be rude with comments like "cool story." I'm sharing my experience with you. I'm not an AI-hype influencer. I'm a SWE who runs a small SaaS business.

Where it sounds like we agree is that there's some obnoxious marketing hype around LLMs. And people who think they can vibe code without careful attention to detail are mistaken. I'm with you there.

These people play around with shit and try to sell you on their secret sauce. If it actually works it will come to claude code - so you can consider them practical SOTA and honestly just plopping CC to a mid sized codebase is a pretty great experience for me already. Not ideal but I get real tangible value out of it. Not 10x or any such nonsense but enough to think that I don't think I want to be managing junior developers anymore, the ROI with LLMs is much faster and significant IMO.

Looking back we see how foolish the anti-php memes were. Meanwhile PHP lives on and becomes better with each release.

Tooling around llms are a natural next step that will become your default one day.

I can't believe we're back to advocating for TDD. It was a failed paradigm that last few times we tried it. This time isn't any different because the fundamental flaw has always been the same: tests aren't proofs, they don't have complete coverage.

Before anyone gets too confused, I love tests. They're great. They help a lot. But to believe they prove correctness is absolutely laughable. Even the most general tests are very narrow. I'm sure they help LLMs just as they help us, but they're not some cure all. You have to think long and hard about problems and shouldn't let tests drive your development. They're guardrails for checking bonds and reduce footguns.

Oh, who could have guessed, Dijkstra wrote about program completeness. (No, this isn't the foolishness of natural language programming, but it is about formalism ;)

https://www.cs.utexas.edu/~EWD/transcriptions/EWD02xx/EWD288...

Testing works because tests are (essentially) a second, crappy implementation of your software. Tests only pass if both implementations of your software behave the same way. Usually that will only happen if the test and the code are both correct. Imagine if your code (without tests) has a 5% defect rate. And the tests have a 5% defect rate (with 100% test coverage). Then ideally, you will have a 5%^2 defect rate after fixing all the bugs. Which is 0.25%.

The price you pay for tests is that they need to be written and maintained. Writing and maintaining code is much more expensive than people think.

Or at least it used to be. Writing code with claude code is essentially free. But the defect rate has gone up. This makes TDD a better value proposition than ever.

TDD is also great because claude can fix bugs autonomously when it has a clear failing test case. A few weeks ago I used claude code and experts to write a big 300+ conformance test suite for JMAP. (JMAP is a protocol for email). For fun, I asked claude to implement a simple JMAP-only mail server in rust. Then I ran the test suite against claude's output. Something like 100 of the tests failed. Then I asked claude to fix all the bugs found by the test suite. It took about 45 minutes, but now the conformance test suite fully passes. I didn't need to prompt claude at all during that time. This style of TDD is a very human-time efficient way to work with an LLM.

  > Tests only pass if both implementations of your software behave the same way.
That's not true.

I even addressed this in my comment as did Dijkstra

This is great. The tests in this case are the spec. When you give the agent something concrete to fail against, it knows what done looks like.

The problem is if you skip that step and ask Claude to write the tests after.

I think there is a difference whether you do TDD or write tests after the fact to avoid regression. TDD can only work decently if you already know your specs very well, but not so much when you still need to figure them out, and need to build something actual to be able to figure it out.

Yes; I think this remains true with coding agents. If you need to do some exploration of the solution space, it makes sense to do that before writing tests. Once you have a clear, workable design, you can get the agent to make a battery of tests to make sure the final product works correctly.

When you write tests with LLM-generated code you're not trying to prove correctness in a mathematically sound way.

I think of it more as "locking" the behavior to whatever it currently is.

Either you do the red-green-with-multiple-adversarial-sub-agents -thing or just do the feature, poke the feature manually and if it looks good then you have the LLM write tests that confirm it keeps doing what it's supposed to do.

The #1 reason TDD failed is because writing tests is BOORIIIING. It's a bunch of repetition with slight variations of input parameters, a ton of boilerplate or helper functions that cover 80% of the cases, but the last 20% is even harder because you need to get around said helpers. Eventually everyone starts copy-pasting crap and then you get more mistakes into the tests.

LLMs will write 20 test cases with zero complaints in two minutes. Of course they're not perfect, but human made bulk tests rarely are either.

  > you're not trying to prove correctness in a mathematically sound way.

  > "locking" the behavior to whatever it currently is.
These two sentences are incompatible

  > The #1 reason TDD failed is
Because spec is an ever evolving thing that cannot be determined a priori. And because it highly incentivized engineers to metric hack.

  > It's a bunch of repetition with slight variations
If that's how you're writing tests then you're writing them wrong. You have the wrong level of abstraction. Abstraction is not a dirty word. It solves these problems. Maybe juniors don't understand that abstraction and fuck it up while learning but making abstraction a dirty word is throwing the baby out with the bath water.

  > Eventually everyone starts copy-pasting crap
Which is a horrendous way to write code.

Locking behavior with tests isn't the same as comprehensive and foolproof tests. They might not cover every edge case, but will fail if the happy path starts failing for some reason.

And yes, copy-pasting is a horrendous way to write code, but everyone does it.

When you're adding the 1600th CRUD endpoint of your career to an enterprise Java/C# application, can you with all honesty say you will type every single character with the same thought and consideration every time?

Or do you just make one, copy-paste that one and modify accordingly?

Or if you write 20 unit tests with slight alterations you masterfully craft every single character to perfection?

I have a limited amount of energy to use every day, I choose to use it in places that matter. The hard bits that LLMs and copy-pasting can't speed up.

Hmm, not so sure TDD is a failed paradigm. Maybe it isn't a pancea, but it is seems like it's changed how software development is done.

Especially for backend software and also for tools, seems like automated tests can cover quite a lot of use cases a system encounters. Their coverage can become so good that they'll allow you to make major changes to the system, and as long as they pass the automated tests, you can feel relatively confident the system will work in prod (have seen this many times).

But maybe you're separating automated testing and TDD as two separate concepts?

Indeed, they are two separate concepts.

I write lots of automated tests, but almost always after the development is finished. The only exception is when reproducing a bug, where I first write the test that reproduces it, then I fix the code.

TDD is about developing tests first then writing the code to make the tests pass. I know several people who gave it an honest try but gave up a few months later. They do advocate everyone should try the approach, though, simply because it will make you write production code that's easier to test later on.

... hmm, just looked it up. According to some sites on the web, TDD was created by Kent Beck as apart of Extreme Programming in the 90's and automated testing is a big part of TDD. Having lived through that era, thinking back, would say that TDD did help to popularize automated testing. It made us realize that focusing a ton on writing tests had a lot of benefits (and yeah, most of us didn't do the test first development part).

But this is kind of splitting hairs on what TDD is, not too important.

I think tests in general are good, just not TDD as it forces you to what I think bad and narrow paradigm of thinking. I think e.g. it is better that I build the thing, then get to 90%+ coverage once I am sure this is what I would also ship.

That's the result I've seen with anyone who tries TDD. Their code ends up being very rigid, making it difficult to add new features and fix bugs. It just ends up making them over confident in their code's correctness. As if their code is bug free. It just seems like an excuse to not think and avoid doing the hard stuff.

  > But maybe you're separating automated testing and TDD as two separate concepts?
I hope it's clear that I am given my content and how I stress I write tests. The existence of tests do not make development TDD.

The first D in TDD stands for "driven". While my sibling comment explains the traditional paradigm it can also be seen in an iterative sense. Like just developing a new feature or even a bug. You start with developing a test, treating it like spec, and then write code to that spec. Look at many of your sibling comments and you'll see that they follow this framing. Think carefully about it and adversarially. Can you figure out its failure mode? Everything has a failure mode, so it's important to know.

Having tests doesn't mean they drive the development. So there's many ways to develop software that aren't TDD but have tests. The important part is to not treat tests as proofs or spec. They are a measurement like any other; a hint. They can't prove correctness (that your code does what you intend it to do). They can't prove that it is bug free. But they hint at those things. Those things won't happen unless we formalize the code and not only is that costly in time to formalize but often will result in unacceptable computational overhead.

I'll give an example of why TDD is so bad. I taught a class a year ago (upper div Uni students) and gave them some skeleton code, a spec sheet, and some unit tests. I explicitly told them that the tests are similar to my private tests, which will be used to grade them, but that they should not rely on them for correctness and I encourage them to write their own. The next few months my office hours were filled with "but my code passes the tests" and me walking students through the tests and discussing their limitations along with the instructions. You'd be amazed at how often the same conversations happened with the same students over and over. A large portion of the class did this. Some just assumed tests had complete coverage and never questioned them while others read the tests and couldn't figure out their limits. But you know the students who never struggled in this way? The students who first approached the problem through design and even understood that even the spec sheet is a guide. That it tells requirements, not completeness. Since the homeworks built on one another those students had the easiest time. Some struggled at first, but many of them got the right levels of abstraction that I know I could throw new features at them and they could integrate without much hassle. They knew the spec wasn't complete. I mean of course it wasn't, we told them from the get go that their homeworks were increments to building a much larger program. And the only difference between that and real world programming is that that isn't always explicitly told to you and that the end goal is less clear. Which only makes this design style more important.

The only thing that should drive the software development is an unobtainable ideal (or literal correctness). A utopia. This prevents reduces metric hacking, as there is none to hack. It helps keep you flexible as you are unable to fool yourself into believing the code is bug free or "correct". Your code is either "good enough" or not. There's no "it's perfect" or "is correct", there's only triage. So I'll ask you even here, can you find the failure mode? Why is that question so important to this way of thinking?

TDD and similiar test paradigms have all the same fundamental flaw -> It's testing for the sake of testing. You need to know exactly what you want in order to start, which isn't compatible with a competitive iterative workflow no matter how much TDD yells otherwise. TDD doesn't make sense in agile and fast iteration workflows, only in heavily regulated / restricted products.

It certainly isn’t. It is more a way of discovery on how to implement something, with the benefit of being able to safely (and thus easily) change it later.

The 99 Bottles book by Sandi Metz [0] is a good short display of how it works and where it helps actually building maintainable software

[0] https://sandimetz.com/99bottles

> But to believe they prove correctness is absolutely laughable.

Sounds like a lack of tests for the correct things.

True, but I seriously doubt people are writing formal proofs for their code. I've only seen this in niche academic circles and high security/safety settings. I also am pretty certain it's not what you're suggesting, but hey, I could be wrong

> But to believe they prove correctness is absolutely laughable.

You don't need to believe this to practice TDD. In fact I challenge you to find one single mainstream TDD advocate who believes this.

It being a lot of work is why they didn't do it at all for weeks and still, without self reflection, wrote that they care about the code quality of the code they hadn't looked at or tested

"You better work, bitch" -- Britney Spears

Our society is obsessed with work. Work will never end. If things become easier we just do more of them. Whether putting all our efforts into recycling things created by those that came before is good for us will remain to be seen.

Our society is obsessed with <the appearance of> work

php still makes money though!

I saw a guys post on LinkedIn who created llm agent to water how plants based on sensor on his plants

He still has to water the plants on his own. Its just that it costs him quite a bit when all of that could he mamaged with an alarm to remind him to water plants.

It's always the uber conservative and over principled people who laugh about using PHP that have an opinion on everything while not knowing how to get shit done.

They're all just tools. You decide how to use them.

Sure but we can agree there's essentially two parallel industries in web development

Engineer at tech firms and WebShops writing WordPress plugins for single clients where Squarespace doesn't cut it.

Is AI another field of people or is it killing one or both of those. TBD

> like using PHP

lmao, chuckled