An LLM most definitely cannot spit out robust tests or thorough documentation. It can spit out some tests or some documentation, but they will not cover the user perspective or edge cases unless those are already documented somewhere. That's verified by both experience and just thinking about it for two seconds.
The sanding down you refer to is what generates those tests and documentation.
> but they will not cover the user perspective or edge cases unless those are already documented somewhere
Are you suggesting that LLM's can't test for people who use screen readers? Keyboard only users? Slow network requests?
You're acting like the issues an app faces are so bespoke to the actual app itself (and have absolutely no relation to existing problems in this space) that an LLM couldn't possibly cover it. And it's just patently wrong.
I'm not talking about keyboards or screen readers or any sort of input testing, I'm talking about how the software is used in practice.
If you disagree with that, I think the onus is on you to show me that an LLM could simulate the full context in which a user interfaces with software. That's a ridiculous claim.
Feel free to show literally any evidence for this claim.
I can’t tell if you’re being sarcastic or not
>Are you suggesting that LLM's can't test for people who use screen readers? Keyboard only users? Slow network requests?
I don't think it's feasible to fully simulate the full depth of actual usage, given that (especially in the case of screen readers and the like) there's a great deal of combinatorial depth and context to the problem. Which screen readers, on which operating systems, and which users thereof?