UI QA only works well if your model plausibly matches the average user behavior and/or real-world edge cases. These models are far from that, and they are much less random than you'd like them to be for fuzzing (mode collapse).

It doesn't need to be that kind of QA. Even just a basic "I want the AI to build the beginnings of a GUI app for me" will work much better if the AI can see the output of its work and iterate on it. Similar if you want the AI to fix a GUI bug—much better if you can show it the the bug and tell it how to test to see when it's gone.

the LLM does not require computer use to see the GUI and, again, that's a pretty niche use and not what Computer Use is being marketed for

> not what Computer Use is being marketed for

Okay, fair, I haven't really paid attention to marketing.

> the LLM does not require computer use to see the GUI and

It can take screenshots without computer use, but it can't click around. I didn't have access to computer use until recently (I'm on an OS where Claude Code technically shouldn't run, I had to patch the binary), and when I got it working it made a big difference because of this.