> LLMs have, on the other hand, made it even easier to botch all of these and quickly roll out a half-baked MVP

Compared to the status quo where people pretty much never consider these things, like accessibility, especially not for an MVP? How many people have never added written aria attribute? I would suspect 90%+ of people touching the frontend.

The difference with LLMs is that (1) they have a latent rigor for things that you weren't going to spend time caring about anyways and, more importantly, (2) you can encode these things into prompts (AGENTS.md) and processes so that they happen even when you weren't going to invest the time with or without AI. For a lot of people this only means collecting some generic "skills" they found online yet it's still much better than what they were going to do pre-AI.

That's why I think AI is saving software in some ways, not leading to worse software.

Or, asserting that AI will botch software might hold more weight with people who have already forgotten how dogshit software was pre-AI.

I can somewhat see your point, but it is generally accepted that a wrong ARIA is worse than none, and LLM-assisted codebases, at least these days, only stick together thanks to testing, the more decent ones heavily emphasize in-depth human code reviews.

If our hypothetical developer hasn't used any accessibility-related tags before, what chance is there that those parts of the website will receive adequate testing?

Testing is an even more powerful subject here since we barely do it.

Testing is so hard that we'll agree that, e.g., TDD is great (e.g. ensure your tests actually test something, ensure your code is testable from the start) yet we never do it. And when we do write tests, we are on the hook to be eternally vigilant that they are not stale, that they test something real, that they are not redundant. And they often turn into an append-only file that you resent.

Meanwhile, AI is happy to write tests, do red-green TDD cycles, refactor them, prune them, update them, justify and defend them. It will even incidentally write tests for the most aloof vibe-coder by accident because they didn't specify otherwise.

Overnight, I went from never testing most of my side projects (except for, say, maybe unit tests in more straightforward things like a parser) to now everything is tested end-to-end. Every time I make a new directional / architectural decision, the tests the AI writes also encode it at the test level to reenforce the decision.

It's strictly a better world for software because AI can write and maintain tests.

> LLM-assisted codebases, at least these days, only stick together thanks to testing

But tests also help humans and ensure human-written software is robust. We only don't test because they are so costly to write and maintain, and our software has always suffered for it. Or the tests become such an unmaintainable mess that our software is now worse because of it!

a11y testing is non-trivial. axe-core can automatically detect many types of issues. However, enough compliance (to avoid being sued) needs end-to-end testing and human judgement. e.g. keyboard traps, focus restoration, alt-text, etc.

> Meanwhile, AI is happy to write tests, do red-green TDD cycles, refactor them, prune them, update them, justify and defend them. It will even incidentally write tests for the most aloof vibe-coder by accident because they didn't specify otherwise.

I read some AI generated tests and while it looks visually impressive, ultimately it wasn’t doing anything valuable? Why? because of all the mocks and scenarios that didn’t matter. And on top of that, tests are additional code to maintain.

These days, I don’t even bother with unit testing. They are a maintenance burden. I focus on integration test (whole modules) and if I have the time, on a harness to do e2e testing.

0% if by testing you mean "somebody who uses a screen reader regularly was able to use the product successfully" because nobody seems to do that.

I would much rather have software that works but lacks accessibility features than software that's broken but also has some broken accessibility features sprinkled in. The former is useful to many people, while the latter is useful to no one.

But the key here is: LLMs don't have latent rigor, nor any other kind of rigor.

But software was already in a horrible state before AI, so your dichotomy doesn't work.

The status quo with pre-AI Human Written software on a pedestal is that it doesn't work, and it lacks accessibility, polish, performance considerations, UX considerations, tests, and more.

The built-in rigor is trivial to prove. Just put Opus 4.8 in plan mode and tell it to plan something, like a vt100 emulator.

The question isn't whether you can do better than AI, because you'll put your foot on the scale and give yourself infinite time, attention, and energy just so you can say yes. It's whether AI can do as good or better than you with the same time, attention, and energy you would have given a task in the first place.

[dead]